Interesting Histograms. |
![]() ![]() |
Interesting Histograms. |
May 30 2011, 02:08
Post
#26
|
|
![]() Group: Members (Donating) Posts: 1983 Joined: 4-January 04 From: Austin, TX Member No.: 10933 |
Then I posted "GNU Octave fucking blows" on Facebook. Now see jj, if I friended you, you would have seen that comment, and you wouldn't have had to go through all that trouble, eh I somewhat associate 'Axon' with wooing Woodinville for quite some time now, but I think that's a new level. You have no idea. I'll bug the admins. I can't do this. In the interim, zip the .m and post that? Well, screen-copying the text above into octave works like a champ It's not like there's any special character stuff in it. Right, but, my hacked audio package is 380 lines of code, my test for said hack was another 50... it starts to add up. In any case. I have uploaded everything I've got HERE. Have at it. |
|
|
|
May 30 2011, 06:14
Post
#27
|
|
![]() Group: Super Moderator Posts: 3268 Joined: 26-July 02 From: princegeorge.ca Member No.: 2796 |
Right, but, my hacked audio package is 380 lines of code, my test for said hack was another 50... it starts to add up. [ codebox ] will work.
This post has been edited by Canar: May 30 2011, 06:14 -------------------- (atrix|(fb2k->e-mu 0404 usb|audio 8 dj))->hd280|jvc ha-fx35-b
|
|
|
|
May 30 2011, 06:53
Post
#28
|
|
![]() Group: Members Posts: 266 Joined: 3-August 08 From: UK Member No.: 56644 |
|
|
|
|
May 30 2011, 07:15
Post
#29
|
|
![]() Group: Super Moderator Posts: 3268 Joined: 26-July 02 From: princegeorge.ca Member No.: 2796 |
More generally,
CODE float scale(int value){ float offset=(MAX_INT+MIN_INT)/2; float range=(MAX_INT-MIN_INT)/2; return (value+offset)/range; } MAX_INT and MIN_INT are the minimum and maximum values for the int type, which is a fixed-point number. This post has been edited by Canar: May 30 2011, 07:19
Reason for edit: cleaning up
-------------------- (atrix|(fb2k->e-mu 0404 usb|audio 8 dj))->hd280|jvc ha-fx35-b
|
|
|
|
May 30 2011, 07:31
Post
#30
|
|
![]() Group: Members Posts: 266 Joined: 3-August 08 From: UK Member No.: 56644 |
But not for wav (or au, or aiff) files, where 0 is defined to be the midpoint ("offset" in the code).
Edit: not sure if the code has representational issues: mathematically, (MAX_INT+MIN_INT)/2 = -0.5 This post has been edited by bandpass: May 30 2011, 07:40 |
|
|
|
May 30 2011, 08:41
Post
#31
|
|
![]() Group: Super Moderator Posts: 3268 Joined: 26-July 02 From: princegeorge.ca Member No.: 2796 |
That code should map int values to 0..1 inclusive, ideally, ignoring external definition of midpoint. Defining the midpoint to be int 0 implies that the encoding is slightly biased towards negative values. Seems to be a really weird engineering decision. Do you have a citation for defining 0 to be the midpoint?
This post has been edited by Canar: May 30 2011, 08:42 -------------------- (atrix|(fb2k->e-mu 0404 usb|audio 8 dj))->hd280|jvc ha-fx35-b
|
|
|
|
May 30 2011, 08:59
Post
#32
|
|
![]() Group: Members Posts: 607 Joined: 16-January 09 Member No.: 65630 |
libsndfile data I was referring was with e-notation and it seemed like everything is OK with 16 digit Octave floats (not mentioning 4-6 digits cool edit Changing to xnor notices then formating everything to 16-digit floats, and double checking, seem fine again Apologetic IPy version for those histograms ![]() However literal translation of Woodinville script and use of libsndfile, seems even faster: CODE def histo(fn): import scikits.audiolab as au (sp, sf, b) = au.wavread(fn) sp = sp.conj().transpose() his=zeros((65536,), dtype=numpy.int) for i in range(len(sp[1])): for j in (0, 1): t = round(sp[j, i] * 32768 + 32769) his[t] += 1 his = maximum(his/float(sum(his)), .000000000001) xax = arange(-32768,32768) semilogy(xax,his) Compared to histo loop in Octave: CODE fname='xerrox.wav'; x=wavread(fname); x=x'; len=length(x) his(1:65536)=0; low=32768; high=32768; windd=hann(2048)'; sfmmean=0; nmeas=0; for ii=1:len for jj=1:2 t=x(jj,ii); if ( t < -1) t=-1 fflush(stdout); end if (t >65535/65536) t=65535/65536 fflush(stdout); end t=round(t*32768+32769); his(t)=his(t)+1; if (t < low) low=t; end if (t > high) high=t; end end end tot=sum(his); his=his/tot; his=max(his, .000000000001); xax=-32768:32767; IPy version is ~5 times faster, using "tic; run; toc" in Octave and "%timeit" in IPython Whole loop (excluding simple arithmetics out of the main loop) is ~12 times faster with IPy using numpy build against MKL: http://pastebin.com/taKbfCDk Not sure if ration is dependent on track length Also parallel and multicore processing feature from IPy wasn't used, and code isn't simplified as it could - I imagine, but it is literally translated to IPy - main reason is because I don't understand what's going on on first sight, and translating was already trouble I used "whos" and slicing both in Octave and IPy, to check if variables match and everything is OK Not sure what's the purpose of "kk - for loop": kk is never used if intended to process both channel data, and even if it should, "jj - for loop" seems to me it could take care. Or perhaps I use Octave rarely -------------------- Scripts (mainly foobar2000 related): http://goo.gl/yje3h
|
|
|
|
May 30 2011, 09:10
Post
#33
|
|
![]() Group: Members Posts: 266 Joined: 3-August 08 From: UK Member No.: 56644 |
That code should map int values to 0..1 inclusive, ideally, ignoring external definition of midpoint. Defining the midpoint to be int 0 implies that the encoding is slightly biased towards negative values. Seems to be a really weird engineering decision. Do you have a citation for defining 0 to be the midpoint? http://msdn.microsoft.com/en-us/library/ms...audiodataformat Weird, certainly: lots of special-case code needed and/or DC-offsets creeping in. |
|
|
|
May 30 2011, 09:44
Post
#34
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
Defining the midpoint to be int 0 implies that the encoding is slightly biased towards negative values. It doesn't have to imply that. The negative range can also be interpreted to just have more headroom. When you convert from a balanced to an unbalanced encoding, the max negative symbol stays unused. When you convert from an unbalanced to a balanced encoding, there indeed needs to be special case handling. Best general guidance would be not using the max negative symbol at all and asking for user feedback when you encounter it on the input pipeline. It is a good thing, that 0 is defined as the midpoint. Else detecting silence would be a PITA. When the first PCM formats were developed, these differences were probably too far below the analog noise floor of the best converters to really cause any concern. This post has been edited by googlebot: May 30 2011, 09:49 |
|
|
|
May 30 2011, 10:20
Post
#35
|
|
![]() Group: Members Posts: 266 Joined: 3-August 08 From: UK Member No.: 56644 |
Best general guidance would be not using the max negative symbol at all and asking for user feedback when you encounter it on the input pipeline. Should an ADC request user intervention whenever its input is < 1/2^n of its range? QUOTE It is a good thing, that 0 is defined as the midpoint. Else detecting silence would be a PITA. I'm not sure that's useful though; in practice, silence is defined in (finite) dB. |
|
|
|
May 30 2011, 11:04
Post
#36
|
|
![]() Group: Developer Posts: 304 Joined: 29-April 11 From: Austria Member No.: 90198 |
I don't see the problem. PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1).
Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767. This post has been edited by xnor: May 30 2011, 11:07 |
|
|
|
May 30 2011, 11:10
Post
#37
|
|
![]() Group: Members Posts: 1355 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
You can go into the library and change it to the proper 2^(n-1) without much trouble. 2^(n-1) is still problematic (though perhaps not for histograms). E.g. a wav containing a minimum value cannot be inverted. That is correct, and proper. That is the nature of 2's compliment, there is only one '0' entry, and thus one extra negative entry. And, yes, if the quantizer is done as a standard PCM quantizer, there must be a zero reconstruction level. That is, after all, the definition. Yes. Really. -------------------- -----
J. D. (jj) Johnston |
|
|
|
May 30 2011, 11:17
Post
#38
|
|
![]() Group: Members Posts: 1355 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
Not sure what's the purpose of "kk - for loop": kk is never used if intended to process both channel data, and even if it should, "jj - for loop" seems to me it could take care. Or perhaps I use Octave rarely It's for doing the spectral flatness measure, but I may have indeed messed up, let me look. Yep, messed up. the '1' in the array reference should be 'kk'. Sigh. I'll see if the sfm is much different. It's unlikely. Oh, and yes, octave is astonishingly slow, even when you only do the histogram stuff. In fact, the SFM stuff adds surprisingly little to the run time, which is bizzare. Even without bounds checking it's slow, slow, slow. No idea why. Matlab is quite a bit faster, but I don't have it here. This post has been edited by Woodinville: May 30 2011, 11:20 -------------------- -----
J. D. (jj) Johnston |
|
|
|
May 30 2011, 13:00
Post
#39
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
Should an ADC request user intervention whenever its input is < 1/2^n of its range? No, just fall back to the usual behavior defined for all out-of-range values. I'm not sure that's useful though; in practice, silence is defined in (finite) dB. It's not a bug, it's a feature! From an analog perspective, a good place for the midpoint would have been between the two smallest symbols and both ranges would have been in perfect symmetry. Silence would then be encoded as some form of noise alternating between the smallest symbols, which is fine, since PCM encoding doesn't make any promise better than that. Giving '0' the privileged meaning of 'digital silence', at the cost of one usual symbol in the positive range, enables scenarios, where you signal something like: "don't try to replicate my primitive approximation of silence but replace it with the best silence you have available". The cost (1 symbol) isn't significant in contrast to the gained possibility. In practice you dither, so a privileged '0' symbol is unnecessary, since silence is encoded as noise anyway. Some dithering tools have something like an "auto-black" feature, though. I don't see the problem. PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1). [-1.0, 1.0] is a perfectly fine range for PCM encoding in float representation. Why should artificial constraints from a legacy storage format be carried over to a better format, which doesn't benefit from that constraint in any way? Conversion to and from [-1, 1] isn't black magic, after all. This post has been edited by googlebot: May 30 2011, 13:12 |
|
|
|
May 30 2011, 18:15
Post
#40
|
|
![]() Group: Developer Posts: 304 Joined: 29-April 11 From: Austria Member No.: 90198 |
[-1.0, 1.0] is a perfectly fine range for PCM encoding in float representation. I (we?) were talking about the PCM format (format tag 1) in RIFF WAVE files, which is integer only, and the normalization issue. With normalized floats (format tag 3) the range is of course, like you posted, -1.0 <= y <= +1.0 and normalization is not needed. I don't think those non-floating point formats are legacy at all. I know a couple of recording engineers that do not use floats as storage format. And I think it's common practice to keep the level at least a fraction of a dB below full scale. Even if you're only 0.01 dB below full scale you're down to something like 32730 with 16-bit integers. This post has been edited by xnor: May 30 2011, 18:41 |
|
|
|
May 30 2011, 18:40
Post
#41
|
|
![]() Group: Super Moderator Posts: 3268 Joined: 26-July 02 From: princegeorge.ca Member No.: 2796 |
Once again I put in my amateur two-bits worth and receive sound instruction in return. I <3 you guys. </off-topic>
-------------------- (atrix|(fb2k->e-mu 0404 usb|audio 8 dj))->hd280|jvc ha-fx35-b
|
|
|
|
May 30 2011, 20:37
Post
#42
|
|
![]() Group: Members (Donating) Posts: 1983 Joined: 4-January 04 From: Austin, TX Member No.: 10933 |
Yeah I also simplified jj's Octave code (into something like 10 lines IIRC), and always attempted to use native Octave functions whereever I could, and it still ran like a dog.
mmmm... Python for numeric work. Crunchy. |
|
|
|
May 30 2011, 20:49
Post
#43
|
|
![]() Group: Members Posts: 1355 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
Yeah I also simplified jj's Octave code (into something like 10 lines IIRC), and always attempted to use native Octave functions whereever I could, and it still ran like a dog. mmmm... Python for numeric work. Crunchy. Nah, dogs run fast. It's not that fast -------------------- -----
J. D. (jj) Johnston |
|
|
|
May 31 2011, 07:49
Post
#44
|
|
![]() Group: Members Posts: 266 Joined: 3-August 08 From: UK Member No.: 56644 |
PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1). Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767. This approach introduces a new mathematics, where inversion is non-linear, which is madness, or at least highly undesirable. Either the analogue signal is biased with ½ LSB, in which case the digital signal range is -32767 to +32767 (-32768 can never occur and would clip if sent to a corresponding DAC), or the digital signal is biased with ½ LSB, in which case the inverse of -32768 is 32767. Note that even though microsoft claims that the midpoint is zero, a WAV file cannot know how your ADC is biased-up. That is correct, and proper. That is, after all, the definition. Yes. Really. Can you provide a source for the definition? This post has been edited by bandpass: May 31 2011, 07:58 |
|
|
|
May 31 2011, 08:03
Post
#45
|
|
![]() lossyWAV Developer Group: Developer Posts: 1722 Joined: 11-April 07 From: Wherever here is Member No.: 42400 |
.... but can you hear a 0.5 lsb offset?
-------------------- lossyWAV -q X | FLAC -8 ~= 308kbps
SGS III (Rooted) + 64GB |
|
|
|
May 31 2011, 08:24
Post
#46
|
|
![]() Group: Members Posts: 1355 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
Can you provide a source for the definition? It goes all the way back to about 1960. I don't recall it presently, but in fact zero is zero. It all boils down to that. There have been a variety of scalings for fix to float conversion, but the most common is that of -1 is the largest negative. Given the reality of integer 2's compliment math, that's really how it all works out. For sign-magnitude integers, you wind up with one zero with two codes for it in the integer. It is possible to do midriser quantizers instead of midtreat quantizers, but then the following bites you: When you start to do integer math and floating point math and expect something to work out the same way, you have to have zero is in fact zero, and nothing else but. Otherwise you have very different domains for your signals. -------------------- -----
J. D. (jj) Johnston |
|
|
|
May 31 2011, 08:26
Post
#47
|
|
![]() Group: Members (Donating) Posts: 1983 Joined: 4-January 04 From: Austin, TX Member No.: 10933 |
0.5lsb is an issue for spectrum analysis, and in principle, might also exacerbate potential stability issues in lowpass filters. But more importantly, ITS JUST WRONG.
PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1). Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767. Not quite -- inverting (multiplying by -1) -32768 results in -32768. Invert all the bits, and add 1. QUOTE That is correct, and proper. That is, after all, the definition. Yes. Really. Can you provide a source for the definition? The wikipedia entry for two's complement arithmetic? This post has been edited by Axon: May 31 2011, 08:38 |
|
|
|
May 31 2011, 08:30
Post
#48
|
|
![]() Group: Members (Donating) Posts: 1983 Joined: 4-January 04 From: Austin, TX Member No.: 10933 |
And just to flesh this discussion out some, yes, having a negative value which cannot be inverted to a positive value *is* the cleanest and most efficient solution. Unless anybody here would instead prefer negative zero. Hands?
|
|
|
|
May 31 2011, 09:19
Post
#49
|
|
|
Group: Developer Posts: 618 Joined: 6-December 08 From: Erlangen Germany Member No.: 64012 |
PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1). Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767. Not quite -- inverting (multiplying by -1) -32768 results in -32768. Invert all the bits, and add 1. Meaning all -1s are inverted to 2, and there will be no 1s? Btw, the way xnor described it is how e.g. Audition inverts. Chris -------------------- If I don't reply to your reply, it means I agree with you.
|
|
|
|
May 31 2011, 10:13
Post
#50
|
|
![]() Group: Members Posts: 266 Joined: 3-August 08 From: UK Member No.: 56644 |
0.5lsb is an issue for spectrum analysis, and in principle, might also exacerbate potential stability issues in lowpass filters. But more importantly, ITS JUST WRONG. Indeed, hence the discussion—it's a small but annoying issue if it's not handled consistently. QUOTE PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1). Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767. Not quite -- inverting (multiplying by -1) -32768 results in -32768. Invert all the bits, and add 1. But that's also highly undesirable for DSP. At the ADC, if there is no bias, inverting an analogue signal that converts to -32768 would produce an analogue signal that converts to 32767; digital inversion should give the same result (at the DAC output that is). QUOTE QUOTE That is correct, and proper. That is, after all, the definition. Yes. Really. Can you provide a source for the definition? The wikipedia entry for two's complement arithmetic? It doesn't mention ADC/DAC biasing. A better place to look might be IEC 60908 or somesuch. If there is no ADC bias (and 16-bit ADC values are stored unmodified or with just the top bit flipped), then a valid DSP solution is: CODE float dsp_sample = (adc_sample + 0.5) / 32767.5; If there is ½ LSB ADC bias then a valid DSP solution is: CODE float dsp_sample = adc_sample / 32767.0; and -32768 is an unused value. The code: CODE float dsp_sample = adc_sample / 32768.0; doesn't seem to map to any real world ADC scenario. In practice, as has been mentioned, recordings are made with headroom and probably have any DC-offset (w.r.t. digital 0) removed with post-processing; this however has the same result as biasing the ADC, which again means that -32768 should be an unused value. This post has been edited by bandpass: May 31 2011, 10:36 |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 25th May 2013 - 09:24 |