Help - Search - Members - Calendar
Full Version: Psy models: will they ever be able to replace listening tests?
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific/R&D Discussion
Kees de Visser
Double blind listening tests are still the preferred method for the evaluation of audibility of differences.
Those tests can be very time and budget consuming. I'm curious about the current state of psycho-acoustic models wrt their ability to simulate listening tests.
How far off are the objective results compared to the subjective ones ? Will psy models ever be able to replace listening or are there theoretical limitations ?

Any input much appreciated.

Kees de Visser

muaddib
It is possible to predict very accurate what would be an average score on a listening test for lower bitrates (up to 96 kbps). Yet there are still some outliers (cases that can not be judged correctly).
For higher bitrates as far as I am aware it is not possible to get precise predictions, but in my opinion the problem is because there are not enough test results that are good (accurate) enough.
I believe that it is possible to get very very accurate results, but still there are two problems: 1. Not all types of "distortions" can be measured accurately enough, 2. Lack of precise data from listening tests
As a part of 2. problem take a look at some of results from previous listening tests. There are cases where one listener gave two different grades to the same encoding in different tests (48kbps and 64kbps Mares) and yet tested sample (at 96kbps) was exactly the same.
To fix problem 1. more research is needed.
Kees de Visser
QUOTE(muaddib @ Dec 20 2007, 10:58) *
1. Not all types of "distortions" can be measured accurately enough
What types do you have in mind ? I had the impression that with modern equipment signals far beyond hearing capabilites can be measured.
QUOTE
2. Lack of precise data from listening tests
As a part of 2. problem take a look at some of results from previous listening tests. There are cases where one listener gave two different grades to the same encoding in different tests (48kbps and 64kbps Mares) and yet tested sample (at 96kbps) was exactly the same.
Thats my point as well: a psy model might be inaccurate, but so are results from listening tests. Reliability increases with the number of subjects and trials.
At least the results of an inaccurate psy model are constant. Another difference IMO is that it's relatively easy to modify the psy model and to redo the test (assuming the source material is still available). This is much more difficult with a listening test.

muaddib
QUOTE(Kees de Visser @ Dec 20 2007, 14:02) *
QUOTE(muaddib @ Dec 20 2007, 10:58) *
1. Not all types of "distortions" can be measured accurately enough
What types do you have in mind ? I had the impression that with modern equipment signals far beyond hearing capabilites can be measured.

For example in my experiments it was impossible to correctly measure double talk that occurs in German speaker sample that is often used in listening tests without sacrificing accuracy for other types of problems that might occur in encoders.
Garf
QUOTE(Kees de Visser @ Dec 20 2007, 14:02) *
What types do you have in mind ? I had the impression that with modern equipment signals far beyond hearing capabilites can be measured.


The problem isn't capturing the signal, it's determining and quantifying how the human hearing system perceives it and grades the distortions. You need to have some way of algorithmically quantifying the distortions.

I believe what mudaddib is saying that testing "quantifiers" for this would be a lot easier with better listening test results, because correlation can be measured more easily and accurately.

QUOTE

At least the results of an inaccurate psy model are constant.


I wouldn't consider this an advantage. Given the choice of a systematic error from a psymodel and a sampling error from a listening test, I'll take the sampling error every time.

muaddib
QUOTE(Garf @ Dec 20 2007, 14:49) *
QUOTE(Kees de Visser @ Dec 20 2007, 14:02) *
What types do you have in mind ? I had the impression that with modern equipment signals far beyond hearing capabilites can be measured.
The problem isn't capturing the signal, it's determining and quantifying how the human hearing system perceives it and grades the distortions. You need to have some way of algorithmically quantifying the distortions.

I believe what mudaddib is saying that testing "quantifiers" for this would be a lot easier with better listening test results, because correlation can be measured more easily and accurately.

Yeap wink.gif
Paulhoff
The problem with DBT tests is that some do not get the results that they want, and that results show that they don't hear as well has they like to think they do. So they blame the DBT tests, and not themselves for failure.

Yes test cost money, welcome to life.

Paul

smile.gif smile.gif smile.gif
muaddib
QUOTE(Paulhoff @ Dec 20 2007, 21:05) *
The problem with DBT tests is that some do not get the results that they want, and that results show that they don't hear as well has they like to think they do. So they blame the DBT tests, and not themselves for failure.

I don't see a connection with the posted question dry.gif
Paulhoff
QUOTE(muaddib @ Dec 21 2007, 07:26) *

I don't see a connection with the posted question dry.gif

Too bad. The only way to get a true test is to use a DBT, but again, there are those who don't get the results that they want, because the tests show that they don't hear what they think they hear (as with speaker cable) so they point to the test as a failure and not with themselves and/or their beliefs. So the DBT is not the preferred test for many so-called audiophiles and snake-oil equipment and add on sellers and that group my be much bigger than you think it is.

Paul

smile.gif smile.gif smile.gif
muaddib
QUOTE(Paulhoff @ Dec 21 2007, 14:50) *
QUOTE(muaddib @ Dec 21 2007, 07:26) *
I don't see a connection with the posted question dry.gif
Too bad. The only way to get a true test is to use a DBT, but again, there are those who don't get the results that they want, because the tests show that they don't hear what they think they hear (as with speaker cable) so they point to the test as a failure and not with themselves and/or their beliefs. So the DBT is not the preferred test for many so-called audiophiles and snake-oil equipment and add on sellers and that group my be much bigger than you think it is.

I suppose that basic assumption in the first question was that only correctly organized test (which includes ABX, Double Blind Test, etc...) is acceptable. It is not here question whether not correctly organized test should be accepted or not. The thing that we are discussing here is what problems might arise in well organized test and if a listening test can be replaces by automatic psychoacoustic test.
Kees de Visser
QUOTE(Paulhoff @ Dec 20 2007, 21:05) *
Yes test cost money, welcome to life.
One of the potential problems of expensive tests is that they often require sponsoring which can undermine the objectivity of the test.
I remember an AES convention forum discussion where a CEO of Crystal Semiconductor (now Cirrus) stated that he would be prepared to donate a considerable amount into scientific research to once and for all determine the limits of human hearing. I imagine he was fed up with the continuous demand for higher quality by the audio community, often without scientific justification.
Wouldn't it make sense to concentrate effort and budget (a few million?) on designing an acceptably accurate psy model instead of performing isolated listening tests over and over again ?
A lot of listening tests are met with scepticism, so I don't see the difference with psy-model testing in that respect.

Paulhoff
QUOTE(Kees de Visser @ Dec 21 2007, 09:16) *
One of the potential problems of expensive tests is that they often require sponsoring which can undermine the objectivity of the test.
I remember an AES convention forum discussion where a CEO of Crystal Semiconductor (now Cirrus) stated that he would be prepared to donate a considerable amount into scientific research to once and for all determine the limits of human hearing.

The limits are already know for human hearing, just ask James Johnston, the co-inventor of the MP3 and the main inventor of AAC. Without knowing the limits of human hearing this Digital formats could not be done.

Paul

smile.gif smile.gif smile.gif
muaddib
QUOTE(Paulhoff @ Dec 21 2007, 15:29) *
The limits are already know for human hearing, just ask James Johnston, the co-inventor of the MP3 and the main inventor of AAC. Without knowing the limits of human hearing this Digital formats could not be done.

Exact "limits" free of mistakes are not known. Only approximation. One more problem is that "limits" are not the same for all people. So you make measurements for group of people and take average and say ok we use these values for our encoder. Then comes somebody with better hearing than any of your subjects (hrghrm guruboolez) and finds many problems in your encoder.
Another problem is as I already wrote that it is even very hard to find hearing capabilities of a single listener. A single listener that makes two same tests in different occasions will make different conclusions (hear differently).

QUOTE
Wouldn't it make sense to concentrate effort and budget (a few million?) on designing an acceptably accurate psy model instead of performing isolated listening tests over and over again ?

Please point me to the person that would invest "few millions" for this kind of a research. And in my opinion few in this case does not mean 2 or 3, but rather much more.
Garf
QUOTE(Paulhoff @ Dec 21 2007, 15:29) *

The limits are already know for human hearing, just ask James Johnston, the co-inventor of the MP3 and the main inventor of AAC. Without knowing the limits of human hearing this Digital formats could not be done.


You don't need to know the limits exactly to get something working.

You just need something that works well enough.

MP3 and AAC work fine but they are certainly not the final word in audio coding. Neither is our knowledge of the human auditory system complete.


QUOTE(Kees de Visser @ Dec 21 2007, 15:16) *

Wouldn't it make sense to concentrate effort and budget (a few million?) on designing an acceptably accurate psy model instead of performing isolated listening tests over and over again ?


How are you going to make such a psymodel without doing listening tests first? (That was one of the original points of muaddib!)

QUOTE

A lot of listening tests are met with scepticism, so I don't see the difference with psy-model testing in that respect.


Gee, some people are sceptic about evolution, too. That doesn't mean creationism is an acceptable alternative.

Paulhoff
QUOTE(muaddib @ Dec 21 2007, 10:37) *

Exact "limits" free of mistakes are not known. Only approximation. One more problem is that "limits" are not the same for all people.

The idea that all people are not the same in all things is a given. But the limits are know for people who have no hearing handicaps. How the ear hears is well know. They also know how the bee flies now, and not only 10% of the brain is used.

Paul

smile.gif smile.gif smile.gif

QUOTE(Garf @ Dec 21 2007, 10:47) *

You don't need to know the limits exactly to get something working.

You just need something that works well enough.

MP3 and AAC work fine but they are certainly not the final word in audio coding. Neither is our knowledge of the human auditory system complete.

If you don't care about the output of your work, that is true, if you do care then that is not ture. There is a hell of a lot more know about hearing then most so-called audio people let on.

Paul

smile.gif smile.gif smile.gif

How else can they sell snake-oil.
Lyx
Something which hasn't been considered yet in this thread, is that its not just the listener side, which is to some extend relative - but also the data side. For low-to-medium bitrate ranges, an utopian psymodel may be useful. However, at high bitrate ranges, stuff is already transparent to most people in the majority of cases. What matters in high bitrate-ranges isn't so much "normal listening situations" anymore - its all about the unusual cases and killer samples.

What i'm implying here, is that different tactics may be efficient, depending on if you are in the low-to-medium bitrate arena, or the high bitrate arena. At high bitrates, a more conventional approach may make sense - playing it safe to reduce the amount of "rare exception cases".

This difference is also visible in the motivations of people who encode music. People which go for up to a maximum of 160kbit average, dont care that much about rare and minor problems - they go for efficiency... something which is "good enough" at reasonable average bitrates. People who encode at 190kbps average and up, usually want to play it safe: They want as stable quality as possible with nearly no exceptions.

However, would such an utopian psymodel be able to fix this usage case? (high bitrate encoding). Or does this have more to do with knowledge about situations which are likely to be difficult to handle for the encoder?
Woodinville
There are some issues. Calculating most-sensitive thresholds is pretty much a doable thing nowdays, albiet with rather a lot of computation.

Unfortunately, when the change is audible, different people very often have different opinions about the impairment, or may in fact regard a change as an improvement.

So, using mechanical evaluation may someday have life for transparency, in my opinion, although I see nothing (public) satisfactory for stereo or spatial hearing yet (but it is possible, that I'm sure of).

But once you're over threshold, I suspect the answer is that "opinions will vary", and that's not going to be something to easily evaluate with mechanical processes.
NEMO7538
I have probably missed something in this discussion, but it seems to me that the question is irrelevant.
Sure a psy model (used for validation) would agree with himself (used for encoding) in what is the best strategy for compression. Isn't it what is already built in most encoders in quantization loops?

I'll be happy to hear where is the fault in my reasoning, if I made myself clear.
muaddib
QUOTE(Lyx @ Dec 21 2007, 23:22) *
However, would such an utopian psymodel be able to fix this usage case? (high bitrate encoding). Or does this have more to do with knowledge about situations which are likely to be difficult to handle for the encoder?

There is not enough publicly available data from listening tests to create such specific model. It is hard to gather all special cases and enough people that can hear problems in those cases. And hard to implement such good model without any bug. I guess also nobody is interested in financing that.

QUOTE(NEMO7538 @ Dec 22 2007, 08:43) *
Sure a psy model (used for validation) would agree with himself (used for encoding) in what is the best strategy for compression. Isn't it what is already built in most encoders in quantization loops?

We are here discussing about possibility of a psy model to correctly model human listener. We are not discussing if psy model would agree with itself. Encoders pose additional constraints while encoding because of quantization. Psy model is there to help encoder make best decisions for quantization steps.

QUOTE(Woodinville @ Dec 22 2007, 06:48) *
So, using mechanical evaluation may someday have life for transparency, in my opinion, although I see nothing (public) satisfactory for stereo or spatial hearing yet (but it is possible, that I'm sure of).

As far as I am aware of, this is again the problem of not having enough data from listening tests. For most people stereo problems are not important and very often even impossible to discover. Yet there are rare people that dislike even smallest change in stereo image.
Paulhoff
QUOTE(Kees de Visser @ Dec 20 2007, 04:40) *
Will psy models ever be able to replace listening or are there theoretical limitations ?

Any input much appreciated.

Kees de Visser

Will any model ever replace people senses, in anything that can be subjective, I doubt it.

Paul

smile.gif smile.gif
Garf
QUOTE(NEMO7538 @ Dec 22 2007, 08:43) *
I have probably missed something in this discussion, but it seems to me that the question is irrelevant.
Sure a psy model (used for validation) would agree with himself (used for encoding) in what is the best strategy for compression. Isn't it what is already built in most encoders in quantization loops?

I'll be happy to hear where is the fault in my reasoning, if I made myself clear.


An encoder works under 2 constraints that a hypothethical perfect psymodel doesn't have:

a) A restricted bitrate, which means that the encoder will have to produce audible distorition, and has to guess where this distortion in the least offensive. (This is already something which we don't understand fully.)

b) Speed. A practical encoder must work reasonably fast. This isn't so important for a model that evalutes the encoders.



QUOTE(Paulhoff @ Dec 21 2007, 21:04) *

QUOTE(Garf @ Dec 21 2007, 10:47) *

You don't need to know the limits exactly to get something working.

You just need something that works well enough.

MP3 and AAC work fine but they are certainly not the final word in audio coding. Neither is our knowledge of the human auditory system complete.

If you don't care about the output of your work, that is true, if you do care then that is not ture. There is a hell of a lot more know about hearing then most so-called audio people let on.


I really don't understand at all what you are trying to say.

What are you claiming or saying exactly?

Are you saying MP3 and AAC cannot be improved upon?
Are you saying our understanding of the human hearing system is complete?
boombaard
QUOTE

QUOTE(Paulhoff @ Dec 21 2007, 21:04) *

QUOTE(Garf @ Dec 21 2007, 10:47) *

You don't need to know the limits exactly to get something working.

You just need something that works well enough.

MP3 and AAC work fine but they are certainly not the final word in audio coding. Neither is our knowledge of the human auditory system complete.

If you don't care about the output of your work, that is true, if you do care then that is not ture. There is a hell of a lot more know about hearing then most so-called audio people let on.


I really don't understand at all what you are trying to say.

What are you claiming or saying exactly?

Are you saying MP3 and AAC cannot be improved upon?
Are you saying our understanding of the human hearing system is complete?


not sure either.. my best guess would be a language barrier issue (that is, an interpretative mistake on paul's part)

anyway, what i really wanted to say: if i recall correctly, a while ago (anywhere upto 2 years?) someone suggested doing a test (and did, though i don't recall how many people contributed) to see how annoying people found the different possible artifacts that (can) occur during encoding.. isn't this also a fairly interesting thing to try WRT the 1. point that muaddib mentioned in post #2?
Paulhoff
QUOTE(Garf @ Dec 22 2007, 14:38) *

Are you saying MP3 and AAC cannot be improved upon?
Are you saying our understanding of the human hearing system is complete?

AAC is a improvement over MP3.

How complete do you need it.

Paul

smile.gif smile.gif smile.gif
knutinh
I see a philosophical dilemma when evaluating lossy compression using auditory models.

The lossy codecs themsevelves use models of the human auditory system. If we assume that a "state of the art" encoder contains a "state of the art" auditory modell to allocate bits as best we can with current knowledge, then there will be no "even more state of the art model" to evauate the codec?


Are there believed to be some upper bounds on lossy compression efficiency? I mean, if you put together a team of dsp and audio people to manually encode a song with unlimited resources, how much better than a general codec could they do (excuding trivial solutions such as building a decoder that essentially contains the single song to be played allready in memory).

I believe that DVD video is encoded by "compressionists" doing multiple passes and manually instructing the encoder at places.

-k
boombaard
QUOTE(knutinh @ Dec 23 2007, 02:38) *

I see a philosophical dilemma when evaluating lossy compression using auditory models.

The lossy codecs themsevelves use models of the human auditory system. If we assume that a "state of the art" encoder contains a "state of the art" auditory modell to allocate bits as best we can with current knowledge, then there will be no "even more state of the art model" to evauate the codec?

well, that's where i figure the 'ear' comes in, really.. maybe not everyone's, but Guru will most likely be around for a while longer ;-)

QUOTE(knutinh @ Dec 23 2007, 02:38) *

Are there believed to be some upper bounds on lossy compression efficiency? I mean, if you put together a team of dsp and audio people to manually encode a song with unlimited resources, how much better than a general codec could they do (excuding trivial solutions such as building a decoder that essentially contains the single song to be played allready in memory).


there are, yes.. ultimately, that'll be reached once the psymodel is a 1:1 interpreter of the human auditory system, but even with the current models out there I think we're pretty close to that..
Remember, compression (for audio) is removing superfluous information (the 'unheard' stuff that's taken away by the encoder), and possibly looking for signal repetition..
As the latter is very hard to find (the repeating patterns), and since you can't really do what they do in video (where you can basically do save a reference frame+diff for more efficient storage of similar frames), since complex waves are such pains in the ass, (IIR my complex analyses course C ;-)) you're looking at a limit that's dependent on the condition of the human ear.
and while it can be argued that a large percentage of today's youth is happily trying to become deaf before turning 40, i'm not sure that's something you can depend upon them succeeding tongue.gif
(that's also more or less why lossless codecs are having trouble becoming more efficient, btw)


ps. IANAD
Kees de Visser
QUOTE(Garf @ Dec 21 2007, 16:47) *
How are you going to make such a psymodel without doing listening tests first? (That was one of the original points of muaddib!)
I've never doubted that designing a psy model involves lots and lots of listening tests. But the advantage would be that once a model has been defined, it can do its job very fast and with constant results.
QUOTE(Woodinville @ Dec 22 2007, 06:48) *
So, using mechanical evaluation may someday have life for transparency, in my opinion, although I see nothing (public) satisfactory for stereo or spatial hearing yet (but it is possible, that I'm sure of).
I would expect mechanical evaluation to be mostly and especially usable for transparency verification (just like ABX/ABCHR). Audio codecs are just a small target of interest. What about audible effects of opamps, electrical components, cables, AD/DA converters etc.? It would be interesting to know if time and budget spent on R&D for a product has an effect on sonic performance at all.
There will probably be a need for several psy models like xx% of the average population and a "golden-ear" world's best (no-one can hear better). As soon as a reliable listening test reveals new threshold data, the psy model could be updated. An audio device designer could check his modifications instantaneously with a psy model, which is much more practical than arrange a golden ear listening test. I can also imagine that mechanical evaluation could give an estimation about how far below audibility something is.
In short: IF an accurate psy-model can be made, I can see mostly advantages over listening tests.
Gabriel
QUOTE(muaddib @ Dec 22 2007, 15:32) *

QUOTE(Woodinville @ Dec 22 2007, 06:48) *
So, using mechanical evaluation may someday have life for transparency, in my opinion, although I see nothing (public) satisfactory for stereo or spatial hearing yet (but it is possible, that I'm sure of).

As far as I am aware of, this is again the problem of not having enough data from listening tests. For most people stereo problems are not important and very often even impossible to discover. Yet there are rare people that dislike even smallest change in stereo image.

It's not only a spacial/soundstage problem, but there is also multichannel masking/unmasking.
knutinh
QUOTE(boombaard @ Dec 23 2007, 04:35) *

QUOTE(knutinh @ Dec 23 2007, 02:38) *

I see a philosophical dilemma when evaluating lossy compression using auditory models.

The lossy codecs themsevelves use models of the human auditory system. If we assume that a "state of the art" encoder contains a "state of the art" auditory modell to allocate bits as best we can with current knowledge, then there will be no "even more state of the art model" to evauate the codec?

well, that's where i figure the 'ear' comes in, really.. maybe not everyone's, but Guru will most likely be around for a while longer ;-)

QUOTE(knutinh @ Dec 23 2007, 02:38) *

Are there believed to be some upper bounds on lossy compression efficiency? I mean, if you put together a team of dsp and audio people to manually encode a song with unlimited resources, how much better than a general codec could they do (excuding trivial solutions such as building a decoder that essentially contains the single song to be played allready in memory).


there are, yes.. ultimately, that'll be reached once the psymodel is a 1:1 interpreter of the human auditory system, but even with the current models out there I think we're pretty close to that..
Remember, compression (for audio) is removing superfluous information (the 'unheard' stuff that's taken away by the encoder), and possibly looking for signal repetition..
As the latter is very hard to find (the repeating patterns), and since you can't really do what they do in video (where you can basically do save a reference frame+diff for more efficient storage of similar frames), since complex waves are such pains in the ass, (IIR my complex analyses course C ;-)) you're looking at a limit that's dependent on the condition of the human ear.
and while it can be argued that a large percentage of today's youth is happily trying to become deaf before turning 40, i'm not sure that's something you can depend upon them succeeding tongue.gif
(that's also more or less why lossless codecs are having trouble becoming more efficient, btw)


ps. IANAD

A trivial bound:
For an instrumental, electronic song, the information that is put into it never exceeds that needed to describe:
A) A set of note on/off commands, velocity, etc (MIDI)
B) A set of short samples triggered by those commands
C) A set of synthesizer "patches" describing synthetic instrument programming
D) A set of mixing and effects settings

If the encoder can assume that the decoder contains a complete virtual recording studio, then those songs can be compressed (lossless or near lossless) probably very efficiently. A lot better than the General Midi attempt of the mid 90s, and probably more versatile than the synthetic audio parts of MPEG4?


Before dismissing this, perhaps one should consider just how much of the musical events in a typical radio song is made just like this.

-k
Woodinville
QUOTE(knutinh @ Dec 30 2007, 17:41) *
A trivial bound:
For an instrumental, electronic song, the information that is put into it never exceeds that needed to describe:
A) A set of note on/off commands, velocity, etc (MIDI)
B) A set of short samples triggered by those commands
C) A set of synthesizer "patches" describing synthetic instrument programming
D) A set of mixing and effects settings

If the encoder can assume that the decoder contains a complete virtual recording studio, then those songs can be compressed (lossless or near lossless) probably very efficiently. A lot better than the General Midi attempt of the mid 90s, and probably more versatile than the synthetic audio parts of MPEG4?


Before dismissing this, perhaps one should consider just how much of the musical events in a typical radio song is made just like this.

-k


Unfortunately, that's not a very useful bound, given the addition of things like reverberation, human operation of controls, singing, etc.

An older attempt at this was Johnston, J. D., “Estimation of perceptual entropy using noise masking criteria,” ICASSP '88 Record, 1988, pp. 2524-2527. and such a work ought to be at least as possible today. I am aware of newer measurements made, but only mentioned peripherally in publication, that put pure transparency at about 1.1 bits/sample for some complex material.

Before suggesting that all is MIDI, one must recall not only preprocessing, but also random issues in synthesizers. Not all random, uniform, flat streams necessarily sound the same, you have to consider both short-term and long-term statistics, something that many random number generators fail at. When you use something like that for your cymbals, how much of the ***audible information*** there is random number generator state? You might be surprised.

It's even worse with real instruments, almost all of which, while they maintain things like pitch splendidly, exhibit pitch jitter due to basic physics that can be heard in at least some cases. Your entropy estimates have to capture all the random elements that are audible. Measuring the MIDI rate isn't necessarily going to do that.
digital
.
Well, we have nearly 42000 members here in HAF, I propose that we post samples of files representative of any two given audio scenarios for folks to audition, stating that answers are acceptable only if presented with ABX results in detail - and base findings on these results.

Scenario: We post two files - for instance; one 400Kb/s .OOG (what OOG developers refer to as ‘10’ on the OOG scale), converted to .WAV; the other a pure .WAV file. We title the files appropriately so folks can sit down and familiarize themselves with the correctly labeled files. We also post two more files with irrelevant names such as Orange.wav and Yellow.wav – (the same two files as above, but with no way for folks to tell which is which).

We ask participants to ABX the two 'color' files and give us their ABX evaluation sheet ‘print-outs’ and base our findings on these results. With 42000 members, we are bound to get at least a 1000 replies. From those thousand, we should be able to extract solid data, no?

Andrew D.
www.cdnav.com

.
Soap
QUOTE(digital @ Dec 31 2007, 05:23) *

.
We ask participants to ABX the two 'color' files and give us their ABX evaluation sheet ‘print-outs’ and base our findings on these results. With 42000 members, we are bound to get at least a 1000 replies. From those thousand, we should be able to extract solid data, no?
.


How do you prevent bad apples from determining which 'colour' file is which?
digital
.
Oddly enough, if one takes (from what I have seen thus far), any file, say a 128Kb/s MP3, and converts it into a .WAV format, the file size increases to be exactly the same as a 'standard' .WAV file. (Can someone please explain why this is? It surprised the hell outta’ me).

Unless the 'bad-apples' use some kind of HEX file editor utility or (???), to discern what the file was originally comprised of, all should be A-Ok. However, you are correct in assuming that there will always be numbskulls who wish to screw with people's minds... I guess one would have to have a margin of error or such to discard odd-balls and buttheads... I dunno...

Andrew D.

.
Paulhoff
QUOTE(digital @ Dec 31 2007, 19:48) *

.
Oddly enough, if one takes (from what I have seen thus far), any file, say a 128Kb/s MP3, and converts it into a .WAV format, the file size increases to be exactly the same as a 'standard' .WAV file. (Can someone please explain why this is? It surprised the hell outta’ me).

Unless the 'bad-apples' use some kind of HEX file editor utility or (???), to discern what the file was originally comprised of, all should be A-Ok. However, you are correct in assuming that there will always be numbskulls who wish to screw with people's minds... I guess one would have to have a margin of error or such to discard odd-balls and buttheads... I dunno...

Andrew D.

.

The .wav file is the same because the time is the same, 176400 bytes a second, even if it is quiet.
Paul

smile.gif smile.gif smile.gif
Woodinville
QUOTE(Soap @ Dec 31 2007, 03:03) *

QUOTE(digital @ Dec 31 2007, 05:23) *

.
We ask participants to ABX the two 'color' files and give us their ABX evaluation sheet ‘print-outs’ and base our findings on these results. With 42000 members, we are bound to get at least a 1000 replies. From those thousand, we should be able to extract solid data, no?
.


How do you prevent bad apples from determining which 'colour' file is which?


Put in some controls, and see what happens when people who answer them.
knutinh
QUOTE(Woodinville @ Dec 31 2007, 07:51) *

QUOTE(knutinh @ Dec 30 2007, 17:41) *
A trivial bound:
For an instrumental, electronic song, the information that is put into it never exceeds that needed to describe:
A) A set of note on/off commands, velocity, etc (MIDI)
B) A set of short samples triggered by those commands
C) A set of synthesizer "patches" describing synthetic instrument programming
D) A set of mixing and effects settings

If the encoder can assume that the decoder contains a complete virtual recording studio, then those songs can be compressed (lossless or near lossless) probably very efficiently. A lot better than the General Midi attempt of the mid 90s, and probably more versatile than the synthetic audio parts of MPEG4?


Before dismissing this, perhaps one should consider just how much of the musical events in a typical radio song is made just like this.

-k

Unfortunately, that's not a very useful bound, given the addition of things like reverberation, human operation of controls, singing, etc.

I did state that my bound was only considering instrumental, electronically generated music, but that most pop music contains sufficient energy produced in such as fashion that at least those components could be recreated this way (simplifying the compression of natural instruments/voice), if only the separate tracks was available pre-mix, pre-rendering and a suitable codec was available. Quite possibly, neither is practically possible.

Also, I did include "parametric post-processing" in my argument, just like General Midi does allow (if I recall correctly) some simple reverb/chorus amount pr instrument. One could hypothetically easily replicate the sound engineer setting of his Lexicon 480L by recording the sound dry and slapping on some metadata either describing the preset used, or a impulse-response to be used for convolution.
QUOTE

An older attempt at this was Johnston, J. D., “Estimation of perceptual entropy using noise masking criteria,” ICASSP '88 Record, 1988, pp. 2524-2527. and such a work ought to be at least as possible today. I am aware of newer measurements made, but only mentioned peripherally in publication, that put pure transparency at about 1.1 bits/sample for some complex material.

Before suggesting that all is MIDI, one must recall not only preprocessing, but also random issues in synthesizers. Not all random, uniform, flat streams necessarily sound the same, you have to consider both short-term and long-term statistics, something that many random number generators fail at. When you use something like that for your cymbals, how much of the ***audible information*** there is random number generator state? You might be surprised.

I dont understand what you are saying. Are you saying that if I input the exact same MIDI sequence into my Waldorf microQ all-digital synth, it will sound different on an audible scale?

If I produce a song using a MIDI sequencer, monitoring using a set of plugins for effects etc. Are you saying that a decoder 10 years from now containing algorithms presicely replicating my monitoring equipment cannot duplicate the sound that I am hearing?

I would argue that "random" and "quasi-random" elements included explisit in synthesizers for sound generation very rarely depend on the specific outcome of the dice thrown. If a sound is mixed with "white noise", 10 different realisations of the same sound would for all intent sound the same, even though a true random noise source would be radically different on a sample-for-sample basis.

QUOTE

It's even worse with real instruments, almost all of which, while they maintain things like pitch splendidly, exhibit pitch jitter due to basic physics that can be heard in at least some cases. Your entropy estimates have to capture all the random elements that are audible. Measuring the MIDI rate isn't necessarily going to do that.

What do you mean by pitch jitter?

Of course, I do not mean that something like an acoustic violin equipped with a MIDI sensor generating note-on, note-off, velocity and pitch/pitch-bend information captures the true information of that instrument. A MIDI-equipped piano would probably come a lot closer, but not all the way.

-k
hellokeith
Has anyone in the lossy encoding area done any work regarding the usage of musical genre? It seems to me that at least encoding speed and perhaps even accuracy improvements could be made if the lossy encoder was given metadata prior to the encode.

This feeds right into psymodels replacing listening tests, since it seems (to me at least) a big problem to make an encoder which works well across all genre's.
Garf
QUOTE(hellokeith @ Jan 16 2008, 00:12) *
Has anyone in the lossy encoding area done any work regarding the usage of musical genre? It seems to me that at least encoding speed and perhaps even accuracy improvements could be made if the lossy encoder was given metadata prior to the encode.

This feeds right into psymodels replacing listening tests, since it seems (to me at least) a big problem to make an encoder which works well across all genre's.


I don't see any point in knowing the genre. Let alone it could make the encoder faster or better. If you have any ideas there I'd like to hear them. What could the metadata tell the encoder that it couldn't figure for itself?

If I'd like to know any metadata for my psymodel, then it would be a profile of the listener smile.gif Or playback level that will be used smile.gif
itv
QUOTE(Garf @ Jan 15 2008, 17:25) *

If I'd like to know any metadata for my psymodel, then it would be a profile of the listener smile.gif Or playback level that will be used smile.gif


The background noise level in the playback situation would be nice to know, too smile.gif
muaddib
QUOTE(itv @ Jan 22 2008, 13:12) *
The background noise level in the playback situation would be nice to know, too smile.gif

It is usually assumed that it is near 0 and it is better to try to achieve this in subjective listening tests.
knutinh
QUOTE(muaddib @ Jan 22 2008, 13:44) *

QUOTE(itv @ Jan 22 2008, 13:12) *
The background noise level in the playback situation would be nice to know, too smile.gif

It is usually assumed that it is near 0 and it is better to try to achieve this in subjective listening tests.

I guess the point was that if the actual frequency-dependant playback noise-floor could be incorporated into the psy-model at encode time, then the priorities of an encoder might change.

Problem is, most realistic usage scenarios of lossy audio supposes a flexible usage, and variable noise-floor.

-k
Woodinville
QUOTE(knutinh @ Jan 22 2008, 04:57) *

QUOTE(muaddib @ Jan 22 2008, 13:44) *

QUOTE(itv @ Jan 22 2008, 13:12) *
The background noise level in the playback situation would be nice to know, too smile.gif

It is usually assumed that it is near 0 and it is better to try to achieve this in subjective listening tests.

I guess the point was that if the actual frequency-dependant playback noise-floor could be incorporated into the psy-model at encode time, then the priorities of an encoder might change.

Problem is, most realistic usage scenarios of lossy audio supposes a flexible usage, and variable noise-floor.

-k



One of the important principles of building any psychoacoustic model, in my opinion,is the idea that "you have no idea what the user will do with their volume control". This is effectively a dual of what you just said.smile.gif
muaddib
Considering prediction of volume for playback, predicting the worst case scenario as I see is the best solution. It is however very hard to achieve this solution.
itv
QUOTE(knutinh @ Jan 22 2008, 06:57) *

QUOTE(muaddib @ Jan 22 2008, 13:44) *

QUOTE(itv @ Jan 22 2008, 13:12) *
The background noise level in the playback situation would be nice to know, too smile.gif

It is usually assumed that it is near 0 and it is better to try to achieve this in subjective listening tests.

I guess the point was that if the actual frequency-dependant playback noise-floor could be incorporated into the psy-model at encode time, then the priorities of an encoder might change.


Your guess is correct, and anyway, it was more of a tongue-in-cheek comment than a serious suggestion. I understand that since we don't know the actual levels we must assume the worst-case scenario with high listening volume and low or non-existent background noise at the psymodel.

But come think of it, there might be few cases though where the background noise could be made available to encoder. Consider for example the situation where you want to encode some files and listen to them only when you are driving your car: if you could record the noise profile inside the car at typical speed, and then give this info to encoder, it would be able to compress the data much more efficiently than in a case where worst-case (no noise) is assumed. Of course, you would probably be very annoyed if you stopped the car and continued to listen to the music. I don't know, maybe this could be worth considering if the storage cost would be something like 100 times what it is now. But luckily, storage is cheap, and we can happily encode our music with enough bits to make it sound transparent in every situation smile.gif
S-12
QUOTE(Paulhoff @ Dec 22 2007, 10:15) *
Will any model ever replace people senses, in anything that can be subjective, I doubt it.

I think it will, but it will most likely not be psychoacoustic models or waveform analyzers as we know them today, but some completely new technology we'll see in coming years.

Until then, the only viable way to objectively test the results of audio encoding is to double-blindly compare encoded samples with a lossless reference in enough cycles to get a meaningful test result.

Also, I don't think it would ever be useful to test something which is purely subjective - thankfully, digital media encoding has an aspect of subjectivity (individually preferred sound quality) and another of objectivity (perceptual variance from reference within a test group). The latter is what we may one day be able to measure in an automated way, but alas, to the best of my knowledge we cannot as yet.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.