Human hearing beats FFT |
![]() ![]() |
Human hearing beats FFT |
Feb 18 2013, 20:03
Post
#26
|
|
|
Group: Members Posts: 22 Joined: 19-May 12 Member No.: 99992 |
Good God, I have been reading/getting PMed on other sites with a variety of ridiculous stuff about this article. So far, it apparently disproves sampling theorem (!) and, by an utterly incredible chain of logic, renders all existing measurement techniques worthless, either because there are ENORMOUS DISTORTIONS HIDING INSIDE THE FOURIERS or because...well...human hearing is nonlinear...erm...therefore all linear measurements are stupid and wrong...therefore tube amps. Or something.
This post has been edited by Willakan: Feb 18 2013, 20:03 |
|
|
|
Feb 18 2013, 20:22
Post
#27
|
|
![]() Group: Members Posts: 1468 Joined: 30-November 06 Member No.: 38207 |
or because...well...human hearing is nonlinear...erm...therefore all linear measurements are stupid and wrong...therefore tube amps. Or something. That one was cute. -------------------- geocities.com/hydrogenaudio: http://goo.gl/tqYZj
|
|
|
|
Feb 19 2013, 00:44
Post
#28
|
|
![]() Group: Members Posts: 1354 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
Good God, I have been reading/getting PMed on other sites with a variety of ridiculous stuff about this article. So far, it apparently disproves sampling theorem (!) and, by an utterly incredible chain of logic, renders all existing measurement techniques worthless, either because there are ENORMOUS DISTORTIONS HIDING INSIDE THE FOURIERS or because...well...human hearing is nonlinear...erm...therefore all linear measurements are stupid and wrong...therefore tube amps. Or something. Yeah, me too, intercourse-it. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Feb 19 2013, 10:53
Post
#29
|
|
![]() ReplayGain developer Group: Developer Posts: 4586 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
It's amazing what people conclude, given that the experiment will have been carried out with digital audio signals (not analogue signal generators, and certainly not vinyl!), and the extreme "10x better than FFT" test clips will happily survive mp3 encoding.
Cheers, David. This post has been edited by 2Bdecided: Feb 19 2013, 10:54 |
|
|
|
Feb 19 2013, 15:51
Post
#30
|
|
![]() Group: Developer Posts: 2983 Joined: 2-December 07 Member No.: 49183 |
I didn't find any mention of FFT in this article. Only "Fourier uncertainty principle" and "uncertainty limit"
This post has been edited by lvqcl: Feb 19 2013, 15:51 |
|
|
|
Feb 19 2013, 16:46
Post
#31
|
|
![]() Group: Super Moderator Posts: 3267 Joined: 26-July 02 From: princegeorge.ca Member No.: 2796 |
The first thing that comes to my mind is: Now how do we design some frequency transform that provides better results than human hearing? I honestly don't know. FFT has been the de-facto frequency transform in my head for far too long. My own attempt to hack around with wavelets never gave me better time/frequency resolution than your typical STFT. What other options do we have?
This post has been edited by Canar: Feb 19 2013, 16:46 -------------------- (atrix|(fb2k->e-mu 0404 usb|audio 8 dj))->hd280|jvc ha-fx35-b
|
|
|
|
Feb 19 2013, 16:56
Post
#32
|
|
![]() Group: Super Moderator Posts: 9263 Joined: 1-April 04 Member No.: 13167 |
I didn't find any mention of FFT in this article. Only "Fourier uncertainty principle" and "uncertainty limit" I tried this already. I guess I'm not the only one wondering how 10x better than "FFT" can survive going through an FFT process. EDIT: added scary quotes. I don't wonder how a reversible process can satisfy the requirements of a non-linear system. This post has been edited by greynol: Feb 19 2013, 17:04 -------------------- Everything sounds the same until it is proven otherwise.
|
|
|
|
Feb 19 2013, 17:27
Post
#33
|
|
![]() Server Admin Group: Admin Posts: 4808 Joined: 24-September 01 Member No.: 13 |
Is not some wavelet/filterbank transform more relevant than the FFT for comparing with human hearing? Yes, there's no reason to limit yourself to FFT. The most advanced psymodels don't use them exactly because of that reason, they use QMF filterbanks or similar. (This already implies that what's in that article isn't so shocking as you'd think) |
|
|
|
Feb 19 2013, 17:32
Post
#34
|
|
![]() Server Admin Group: Admin Posts: 4808 Joined: 24-September 01 Member No.: 13 |
The first thing that comes to my mind is: Now how do we design some frequency transform that provides better results than human hearing? I honestly don't know. FFT has been the de-facto frequency transform in my head for far too long. My own attempt to hack around with wavelets never gave me better time/frequency resolution than your typical STFT. What other options do we have? Parallel bandpass filters (PEAQ Advanced). More accurate, very slow. Wavelets on the MDCT coefficients (Opus). Fast, can switch the T/F tradeoff depending on the signal. The ear works more like the parallel filters setup. |
|
|
|
Feb 19 2013, 21:07
Post
#35
|
|
|
Xiph.org Speex developer Group: Developer Posts: 430 Joined: 21-August 02 Member No.: 3134 |
Yes, there's no reason to limit yourself to FFT. The most advanced psymodels don't use them exactly because of that reason, they use QMF filterbanks or similar. (This already implies that what's in that article isn't so shocking as you'd think) FFTs, MDCTs, QMFs and other filter banks are all fundamentally bound by the uncertainty principle: the product of the frequency resolution and time resolution cannot be smaller than 1. This is the case for any non-parametric model/transform, i.e. when you don't make any particular assumptions about your signal. There are however parametric models one can use. The best example is a model where you directly fit sinusoids of arbitrary frequencies (as opposed to Fourier, which uses sinusoids of predetermined frequencies). With such a model, the resolution is only limited by practical concerns like noise, other sinusoids, and modulation effects. As a trivial example, if you give me three samples and promise that they represent only a single sinusoid (no noise or modulation), then I can calculate the exact frequency of that sinusoid. So in theory, sinudoidal modeling solves all the time-freq issues of the FFT. The only problem is that it's damn hard to use, especially when it comes to having a good enough analysis. And that's why we don't don't have any high-quality sinusoidal-based audio codecs. |
|
|
|
Feb 20 2013, 19:51
Post
#36
|
|
![]() Group: Members Posts: 106 Joined: 3-June 05 From: Coconut Creek Fl Member No.: 22486 |
For those in the know..........
The Princess and the Pea. That sums it all up. Paul This post has been edited by Paulhoff: Feb 20 2013, 20:12 -------------------- "Reality is merely an illusion, albeit a very persistent one." Albert Einstein
|
|
|
|
Feb 21 2013, 04:55
Post
#37
|
|
![]() Group: Members Posts: 1354 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
It's still a confused headline. Recognzing one of a set of different sine waves is not limited by the Gabor limit.
Observing that that is not limited by the Gabor limit is like observing that white is not limited by aircraft. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Feb 21 2013, 07:54
Post
#38
|
|
![]() Server Admin Group: Admin Posts: 4808 Joined: 24-September 01 Member No.: 13 |
|
|
|
|
Feb 21 2013, 10:02
Post
#39
|
|
![]() Group: Members Posts: 1049 Joined: 16-February 08 From: NL Member No.: 51347 |
Yeah, we don't want to turn HA into an extension of horse_ebooks.
|
|
|
|
Feb 22 2013, 04:22
Post
#40
|
|
![]() Group: Members Posts: 1354 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
It's still a confused headline.... is like observing that white is not limited by aircraft. I'm willing to rename this thread to "white is not limited by aircraft" but I don't think it'll make things better No better. Just as meaningful. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Feb 25 2013, 13:14
Post
#41
|
|
![]() ReplayGain developer Group: Developer Posts: 4586 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
Never mind the title, I still don't find a satisfactory answer in this thread.
I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue. However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort. So, simply, what is the reason that this is OK? Cheers, David. |
|
|
|
Feb 25 2013, 15:32
Post
#42
|
|
|
Group: Members Posts: 514 Joined: 1-November 06 Member No.: 37047 |
Never mind the title, I still don't find a satisfactory answer in this thread. I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue. However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort. So, simply, what is the reason that this is OK? Cheers, David. I guess the switching between two different time/frequency resolution transforms in many lossy codecs is a sort of "ad hoc" fix for not doing a proper modelling of our hearing aparatus? Not all audio processing/transmission may need to include an accurate model of our hearing. Perhaps a crude STFT is simply sufficient for some applications. So what if we deviced an insanely complex, irregular, nonlinear filterbank (Volterra filterbank?). What could it be used for? Better lossy coding? (I think that there are other tradeoffs in lossy coding as well, such as signal compaction). Could we make better "frequency analyzers"? (what engineers would be able to interpret the plots from such a device?). -k |
|
|
|
Feb 25 2013, 17:24
Post
#43
|
|
![]() ReplayGain developer Group: Developer Posts: 4586 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
I guess the switching between two different time/frequency resolution transforms in many lossy codecs is a sort of "ad hoc" fix for not doing a proper modelling of our hearing aparatus? Not in the sense discussed in this paper. If that lossy codec filterbank and/or transform defined/trashed the performance that's measured in this paper (it doesn't), then even with optimal choice of transform length options and optimal switching between them, the result would be 10x too bad. I think your other two paragraphs are right though. I'd just love to see a robust scholarly explanation, because I think we're going to need it after this paper. Cheers, David. |
|
|
|
Feb 26 2013, 00:10
Post
#44
|
|
|
Xiph.org Speex developer Group: Developer Posts: 430 Joined: 21-August 02 Member No.: 3134 |
Never mind the title, I still don't find a satisfactory answer in this thread. I understand that the human ear uses a wobbling membrane as something like a filter bank, with a number of non-linear processes, and an amazing analysis of the signals coming from it, to deliver the hearing capacities that we can probe in listening tests and experience every day. I understand that this is nothing like an FFT. I understand that the frequency resolution of masked noise is not that critical, so we use FFTs in codecs in a place where their frequency resolution is far over-specified, rather than being an issue. However, we often describe other things in audio and hearing with an FFT-like model. It crops up in sampling theory. We push all the audio through a comparable filterbank in most lossy codecs. It is true that these transforms are mathematically lossless/reversible - but if we're messing with things in the other domain, this is little comfort. If you want to think about this in terms of FFTs... consider the case of a 10 ms FFT window. The resolution of that FFT is 100 Hz. Does this mean we can't tell the frequency of a sinusoid with better than 100 Hz accuracy using that FFT? Absolutely not. First, we can use interpolation with the neighbouring bins to get a more precise value. If we have FFTs at other time offsets, we can do even better. We can look at phase changes for a certain bin and compute the exact (within noise limits) frequency of the sinusoid that's around that bin. So we've again "beaten Heisenberg", but only because we've assumed that we have a single sinusoid around that bin. AFAIK, the human ear is capable of similar phase processing to figure out the frequency. It has to do something like htat because it's "critical bands" are far wider than the bins of a 10 ms FFT. There's only ~25 critical bands for the entire 20 Hz - 20 kHz spectrum. |
|
|
|
Feb 26 2013, 02:25
Post
#45
|
|
![]() Group: Developer Posts: 304 Joined: 29-April 11 From: Austria Member No.: 90198 |
<deleted>
This post has been edited by xnor: Feb 26 2013, 15:10 |
|
|
|
Feb 26 2013, 04:30
Post
#46
|
|
![]() Group: Members Posts: 1354 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
There are a number of issues confused in this thread.
The first is that the Gabor limit applies. The Gabor limit only applies when what you need to detect is completely unknown. Hearing the difference between notes is not at all the same problem. The second that this 'beats FFT'. It beats the single-bin resolution of an FFT, but once you know you're dealing with a single cycle of a single sine wave, that problem becomes moot, because an FFT is 1:1 and onto, i.e. orthonormal, tight frame, etc, and the information is all retained. So, yes, it is there in the FFT that has wider bands, just not in the usual way one would extract it. The GABOR LIMIT DOES NOT APPLY TO THIS DETECTION ISSUE, and YES, Batman, the FFT can be used in such detection, it's just a dumb way to do it. Third, the ear has about 60Hz bands until you get to the point where 1/4 octave is wider, and then they are 1/4 octave wide, give or take. This has little reading on the actual frequency detection mechanism, because the phase of firing of neurons is radically different below and above the center frequency of a given hair cell. This, alone, to 500Hz, can suffice to demonstrate pitch detection ability. And since the filters are wide, they settle fast, and hence again we beat the gabor limit, because we know we're looking for ONE set of frequencies, not any arbitrary frequency. So, the headline is just confused, it's comparing an apple, an orange, and a crate full of bowling balls, and concluding that apples are orange-colored and weigh 12 lbs. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Feb 26 2013, 10:57
Post
#47
|
|
![]() ReplayGain developer Group: Developer Posts: 4586 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
Thank you JJ.
|
|
|
|
Feb 27 2013, 01:51
Post
#48
|
|
|
Group: Members Posts: 2082 Joined: 18-December 03 Member No.: 10538 |
http://arstechnica.com/science/2013/02/hum...3s-sound-worse/
This post has been edited by krabapple: Feb 27 2013, 01:52 |
|
|
|
Feb 27 2013, 02:44
Post
#49
|
|
![]() Group: Super Moderator Posts: 9263 Joined: 1-April 04 Member No.: 13167 |
Well the quality of the comments look pretty encouraging, though I imagine the section's entropy will increase, especially after the more informed people get tired of participating.
-------------------- Everything sounds the same until it is proven otherwise.
|
|
|
|
Feb 27 2013, 05:55
Post
#50
|
|
![]() Group: Members Posts: 1354 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
Well the quality of the comments look pretty encouraging, though I imagine the section's entropy will increase, especially after the more informed people get tired of participating. I don't belong to that particular site. If somebody would like to convey my feeling, please feel free. I'm tired of dealing with what I can only describe as 'poo flinging' in most of the audio press. -------------------- -----
J. D. (jj) Johnston |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 21st May 2013 - 23:59 |