Human hearing beats FFT |
![]() ![]() |
Human hearing beats FFT |
Mar 14 2013, 17:39
Post
#51
|
|
|
Group: Members Posts: 147 Joined: 31-July 08 Member No.: 56508 |
The first is that the Gabor limit applies. The Gabor limit only applies when what you need to detect is completely unknown. Speaking of Gabor, there's a nice prior art from 1946 suggesting that “human hearing beats FFT”: QUOTE Actually, as noted by Dennis Gabor (best known for his invention of holography, but who also worked in audio) back in 1946, the ears actually analyse the frequency content of sounds in time faster than suggested by the uncertainty principle by a factor of about 7. The seeming logical contradiction with the fundamental theoretical limit of time/frequency resolution is avoided by the ear’s use of a-priori or previously assumed knowledge of the nature of typical sounds but at the expense of getting the analysis ‘wrong’ when sounds not of the assumed form occur. (quote taken from M. Gerzon's paper )
|
|
|
|
Mar 15 2013, 12:15
Post
#52
|
|
![]() ReplayGain developer Group: Developer Posts: 4587 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
Interesting paper, thank you.
|
|
|
|
Apr 2 2013, 01:03
Post
#53
|
|
|
Group: Members Posts: 6 Joined: 1-April 13 Member No.: 107483 |
EST is a new transform that can explain the results of the article. Fourier-related transforms, like FFT, are just one way to find frequencies, and clearly not the best possible. EST derives frequencies from samples and is unrelated to Fourier/FFT. The process of EST is deterministic, does not use non-linear equations, and can handle noise. In the ideal case of a noiseless signal composed of n sinusoids, the frequencies, amplitudes and phases are precisely recovered from 3n equally spaced real samples. A noisy signal will require more samples, depending on noise level. Other than the minimum for the ideal case, accuracy does not depend on the number of samples (time). The additional samples for a noisy signal are needed to handle noise. EST can also transform samples into increasing/decreasing sinusoids, which is a better way to model audio. In such a case, for a noiseless signal, 4 samples are required per increasing/decreasing sinusoid, and more for a noisy signal. EST can be evaluated using a demo program that implements it. There is also a paper that details the transform and its mathematical basis. Those interested to see the paper and/or the demo program, can email me at gringya atsign gmail dot com. |
|
|
|
Apr 2 2013, 23:47
Post
#54
|
|
![]() Group: Members Posts: 1354 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
Fourier-related transforms, like FFT, are just one way to find frequencies, and clearly not the best possible. Which, of course, depends entirely on your definition of "Frequency", something that itself is trickier than some seem to realize. QUOTE EST derives frequencies from samples and is unrelated to Fourier/FFT. What does "EST" stand for, in the first place. Does it use a complex exponential or a representation of a complex exponential? QUOTE The process of EST is deterministic, does not use non-linear equations, and can handle noise. Which is true of the Fourier Transform, as well. QUOTE In the ideal case of a noiseless signal composed of n sinusoids, the frequencies, amplitudes and phases are precisely recovered from 3n equally spaced real samples. Sounds pretty good. What's the basis set you're using? Sounds a lot like a * sin (b *t +c) where a,b,c are the 3 samples. Not sure what "equally spaced" means here, unless you're referring to the fact you can characterize a sine wave with 3 non-degenerate points. QUOTE A noisy signal will require more samples, depending on noise level. No surprise. QUOTE Other than the minimum for the ideal case, accuracy does not depend on the number of samples (time). The additional samples for a noisy signal are needed to handle noise. EST can also transform samples into increasing/decreasing sinusoids, which is a better way to model audio. In such a case, for a noiseless signal, 4 samples are required per increasing/decreasing sinusoid, and more for a noisy signal. So it's Laplace-based instead of Fourier based, then? Instead of bombarding us with a bunch of not-very-specific qualities, why not just tell us what the basis set is, and how the analysis works? I am aware of approximately infinite (well, literally infinite but obviously I haven't generated them all!) numbers of basis sets, many of which this could describe. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Apr 2 2013, 23:59
Post
#55
|
|
|
Group: Members Posts: 147 Joined: 31-July 08 Member No.: 56508 |
Yaakov, also check out the Reassigned spectrogram mode in iZotope RX. It “beats FFT” in terms of time and frequency resolution: it can precisely localize impulsive events in time and precisely display frequencies of harmonics, assuming that they do not overlap in FFT spectrum.
|
|
|
|
Apr 3 2013, 01:42
Post
#56
|
|
|
Group: Members Posts: 6 Joined: 1-April 13 Member No.: 107483 |
EST stands for Exponential Sum Transform and it uses complex exponentials.
The basis is sigma(c*b^t) where b and c are non-zero complex numbers and the set of b is distinct. If all b are on the unit circle, then it is simply a spectrum. When all b are on the unit circle and the samples are real, this becomes sigma(a*cos(b*t+c)) The samples must be equally space, not just non-degenerate. It clearly looks more like Laplace than Fourier, but a specific relation, if exists, is not known to me. As for describing the analysis, I offered to send the detailed paper. Do you prefer an informal description? |
|
|
|
Apr 3 2013, 05:27
Post
#57
|
|
![]() Group: Super Moderator Posts: 3267 Joined: 26-July 02 From: princegeorge.ca Member No.: 2796 |
I think a lot of us here would be interested in a formal description, myself included. I think from what you've just said that we'll get it puzzled out though.
-------------------- (atrix|(fb2k->e-mu 0404 usb|audio 8 dj))->hd280|jvc ha-fx35-b
|
|
|
|
Apr 3 2013, 18:14
Post
#58
|
|
|
Group: Members Posts: 6 Joined: 1-April 13 Member No.: 107483 |
|
|
|
|
Apr 3 2013, 18:31
Post
#59
|
|
|
Group: Super Moderator Posts: 4344 Joined: 23-June 06 Member No.: 32180 |
If I may guess, I think he means that this site has a significant number of users who would appreciate detailed descriptions. However, that is not to stop you from providing less technical information (i.e. ‘layman’s terms’) if you want to; there are probably other users who would like that, too.
|
|
|
|
Apr 3 2013, 20:34
Post
#60
|
|
![]() Group: Members Posts: 1468 Joined: 30-November 06 Member No.: 38207 |
I think I could very well use a formula or two ... point seven eighteen twentyeight ...
As for describing the analysis, I offered to send the detailed paper. Do you prefer an informal description? I think I just got one that was a bit too rough This post has been edited by Porcus: Apr 3 2013, 20:37 -------------------- geocities.com/hydrogenaudio: http://goo.gl/tqYZj
|
|
|
|
Apr 3 2013, 22:10
Post
#61
|
|
|
Group: Members Posts: 6 Joined: 1-April 13 Member No.: 107483 |
The following link:
http://www.mediafire.com/view/?ce47jurz43wzjce is to a short document that describes the EST process for real noiseless samples. |
|
|
|
Apr 11 2013, 11:09
Post
#62
|
|
![]() Group: Members Posts: 1354 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
Hm. Define "noiseless". Most instruments have a chaotic part of their performance that in fact is noiselike in that it does not repeat, is not entirely stationary, depends on technique, and so on.
So, I'm not quite sure I know what you mean by noiseless. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Apr 11 2013, 19:33
Post
#63
|
|
|
Group: Members Posts: 6 Joined: 1-April 13 Member No.: 107483 |
The paper described the mathematical basis of EST, which uses the ideal case of perfect increasing/decreasing sinusoids.
For realistic data, EST uses different processes, that expect noise. For audio, the EST process is as follows. 1. Find linear prediction coefficients, preferably using the covariance method and not the auto-correlation method. 2. Create the linear prediction polynomial. 3. Find the roots of the linear prediction polynomial to establish the basis set of an exponential sum function, as described in the paper. 4. Use the samples and the basis set to find the coefficients of the function. The key point is that linear prediction coefficients and an exponential sum function, are equivalent, with the exponential sum function having the distinct advantage of being an analytic function with a useful structure. The mathematical basis proves this equivalence. Due to the equivalence, an exponential sum function models an audio signal with the same quality as linear prediction. You may note that the best lossless audio compressors, like OptimFROG, use linear prediction. This is a strong indication of the power of linear prediction to model audio. Since EST generates an analytic function, it is suitable for lossy audio compression, as well as other audio applications. Once EST generated an exponential sum function, you can do the following: Identify noise elements, using frequency and/or amplitude, and remove them. Identify inaudible elements, and remove them. Quantize the coefficients. Resample the audio signal, both sample rate and sample depth. And various other things. Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data. In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction. |
|
|
|
Apr 11 2013, 20:36
Post
#64
|
|
![]() Group: Members Posts: 1354 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data. In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction. Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz". Let me know how big your covariance matrix is, too, ok? -------------------- -----
J. D. (jj) Johnston |
|
|
|
Apr 11 2013, 21:32
Post
#65
|
|
|
Group: Members Posts: 6 Joined: 1-April 13 Member No.: 107483 |
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data. In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction. Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz". Let me know how big your covariance matrix is, too, ok? In a practical implementation the samples will be broken into blocks and there will be a chosen matrix size for that block size. The size of the matrix and the block size will determine accuracy and an accuracy-speed trade-off. This is also the way it is done when using linear prediction for lossless audio compression or for speech compression. The difference is that EST returns an analytic function. 30 senconds of audio will therefore be broken into many smaller blocks, and not treated as a single block. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 22nd May 2013 - 13:28 |