Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Interesting Histograms. (Read 45141 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Interesting Histograms.

Reply #50
Oh, and yes, octave is astonishingly slow, even when you only do the histogram stuff. In fact, the SFM stuff adds surprisingly little to the run time, which is bizzare.

Even without bounds checking it's slow, slow, slow. No idea why.

Matlab is quite a bit faster, but I don't have it here.

Seems that array creation is main break here. I tried to speed up histogram and this code:

Code: [Select]
def histo(fn):
    import scikits.audiolab as au
    (sp, sf, b) = au.wavread(fn)

    sp = sp.transpose()
    his=zeros((65536,), dtype=numpy.int)

    for i in range(len(sp[0])):
        his[int(sp[0, i] * 32768) + 32768] += 1
        his[int(sp[1, i] * 32768) + 32768] += 1

    his = maximum(his / float(sum(his)), 10e-12)
    x = arange(-32768, 32768)
    semilogy(x, his)


is ~15 times faster then Octave or ~3 times faster then previously posted.

This is bare bone NumPy - simple numpy array processing, so other then Python and NumPy nothing more is required (except SciPy or audiolab for reading WAV data; edit: and matplotlib for graphs).
MKL build isn't noticeable here, as there are no linalg functions. However this can of course further be optimized by various approaches, i.e.: http://www.scipy.org/PerformancePython Example here is totally different, (scheme on grid and testing it convergence) but provides examples how NumPy/SciPy can be dissected further. I just thought to link to that, as I'm not aware how many users here are familiar with this Python approach for numerical problems

I'm wondering also about Matlab performance. My main PC PSU went down and I can't test Matlab 2009b these days. Can someone compare performance between Octave and Matlab, just for histogram code:

Code: [Select]
fname='some.wav';
x=wavread(fname);
x=x';
len=length(x)
his(1:65536)=0;
for ii=1:len
for jj=1:2
t=x(jj,ii);
t=round(t*32768+32769);
his(t)=his(t)+1;
end
end

Interesting Histograms.

Reply #51
Not quite -- inverting (multiplying by -1) -32768 results in -32768. Invert all the bits, and add 1.

I was just describing that inverting negative full-scale will result in a 16-bit PCM value of 32767.
You're talking about the implementation. If you blindly add 1 you run into the same overflow problem you just described.
(Nitpicking..  )


In practice, as has been mentioned, recordings are made with headroom and probably have any DC-offset (w.r.t. digital 0) removed with post-processing; this however has the same result as biasing the ADC, which again means that -32768 should be an unused value.


Well it's not unused in practice, just take a look at some recordings, DAWs etc.

And here's an excerpt from the CS5550 ADC datasheet:

Quote
These signed registers contain the last value of the measured results of AIN1 and AIN2. The results will be with-
in the range of -1.0 ? AIN1,AIN2  < 1.0.  The value is represented in two's complement notation, with the binary
point place to the right of the MSB (MSB has a negative weighting). These values are 22 bits in length. The two
least significant bits, (located at the far right-side) have no meaning, and will always have a value of “0”.


ADS1232:
Quote
A positive full-scale input produces
an  output  code  of  7FFFFFh  and  the  negative  full-scale
input  produces  an  output  code  of  800000h.
[...]
Ideal Output Code vs Input Signal: 0 Vin = 000000h out
"I hear it when I see it."

Interesting Histograms.

Reply #52
Code: [Select]
float dsp_sample = adc_sample / 32768.0;

doesn't seem to map to any real world ADC scenario.

In practice, as has been mentioned, recordings are made with headroom and probably have any DC-offset (w.r.t. digital 0) removed with post-processing; this however has the same result as biasing the ADC, which again means that -32768 should be an unused value.


I must differ, zero must be zero. Otherwise how do you justify converting coefficents in filters between fix and float, etc? Consider the differences in rounding and how they may work out in terms of zeros when you can NOT represent ZERO in the "fix" case.

It's ugly, ugly, ugly. You must have zero, and don't forget, 2s compliment has a single zero. Ergo, you need to map that zero to zero.

Introducing ADC's is simply a complete distraction here. Zero is zero. Not "almost zero".
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #53
I hope I'm not bothering you all with my Python snippets, but this morning I tried to replace slow loop with FORTRAN code (it's first time I tried this), and result was that Octave was outperformed by ~3000 times faster code.

Back to back:

Here is Fortran code, which was used as fhist module within Python: http://pastebin.com/suMgN5QB

I guess inline C/C++ with scipy.weave could give similar result

Interesting Histograms.

Reply #54
I hope I'm not bothering you all with my Python snippets, but this morning I tried to replace slow loop with FORTRAN code (it's first time I tried this), and result was that Octave was outperformed by ~3000 times faster code.

Back to back: [a href="http://i.imgur.com/9dupd.png" target="_blank"]
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #55
Not sure if I get your writing, but as mentioned I guess that array creation within that loop (iterating over an array) is what is slowing down
I also have no idea what Octave is doing, or NumPy at that level, but as suggested in posted link about performance, it's not that hard to inline C/C++ or slightly harder to do it with Fortran, and then gain some unbelievable speed up, while still in friendly environment
Even if it is about couple of plots, simple extension while writing the script seems worth the effort

Of course, I'm not suggesting you to install Python/NumPy and all that's needed if you don't already have or are familiar with it, just showing potential possibility

Matlab should perform better then Octave, but I'm guessing it still would be very slow

Interesting Histograms.

Reply #56
Not sure if I get your writing, but as mentioned I guess that array creation within that loop (iterating over an array) is what is slowing down


Except that I create the arrays outside the loop and then iterate over them.  It's well known that repeatedly growing an array is a disaster, but I'm not doing that.

Maybe octave is, for some reason, but one would hope that it would take the original his(bounds)=0 and allocate the whole array then and there, since I am setting the whole array to a value.

Array creation (i.e. malloc-ing it) is not the same as iterating over an already created array, which is what I set up.
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #57
Quote
A positive full-scale input produces an  output  code  of  7FFFFFh


Yes, but a lower voltage might also produce the same value.  And in my experience, IC data sheets are frequently inaccurate; here's one from Analog Devices (AD5399) which immediately contradicts itselfs:



Introducing ADC's is simply a complete distraction here.

How so? The numbers in a wav file typically have their origins in an ADC and will be presented to a DAC.

If your ADC works like this:



then, you don't have a central value to label as zero (in 2's complement).

If you bias by ½LSB:



you cause (effective) clipping to occur at the top end.  And there are other possible ways that it can be done.

Interesting Histograms.

Reply #58
Introducing ADC's is simply a complete distraction here.

How so? The numbers in a wav file typically have their origins in an ADC and will be presented to a DAC.



The problem comes about when doing signal processing or transmission with format changes, or when the actual meaning of the PCM signal is the issue.

The 1/2lsb offset just does not matter at all in the real world. The difference in clipping levels is beyond minescule.  It's a clipping level change of 2e-4 dB.

That is utterly, completely, and without a doubt irrelevant.

And, once more, with feeling, the DEFINITION of the integer PCM signal is a midtread quantizer, NOT a midriser quantizer.  Can we please just stick to the definitions that were set up somewhere around 1960? Please?
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #59
Not sure if I get your writing, but as mentioned I guess that array creation within that loop (iterating over an array) is what is slowing down


Except that I create the arrays outside the loop and then iterate over them.  It's well known that repeatedly growing an array is a disaster, but I'm not doing that.

Maybe octave is, for some reason, but one would hope that it would take the original his(bounds)=0 and allocate the whole array then and there, since I am setting the whole array to a value.

Array creation (i.e. malloc-ing it) is not the same as iterating over an already created array, which is what I set up.

Absolutely wrong wording. I thought to edit it after posting it yesterday, but left it. I meant that on every loop whole 'his' array is re-created, not just indexed, but that is wild guess

So, every new day, new things learned. I got a tip that there is already function in Octave that does the hard work - histc(). So for the sake of completeness about discussion why Octave is so slow with this interesting histograms, this would speed it on couple of levels (x150):

Code: [Select]
x=wavread(filename);
x=x';

t = round(x(:) * 32768 + 32769);
his = histc(t, 1:65536)';

his=max(his/sum(his), 1e-12);

xax=-32768:32767;

semilogy(xax,his);
axis([-40000 40000 1e-10 1e-1])

Interesting Histograms.

Reply #60
So, every new day, new things learned. I got a tip that there is already function in Octave that does the hard work - histc(). So for the sake of completeness about discussion why Octave is so slow with this interesting histograms, this would speed it on couple of levels (x150):

Code: [Select]
x=wavread(filename);
x=x';

t = round(x(:) * 32768 + 32769);
his = histc(t, 1:65536)';

his=max(his/sum(his), 1e-12);

xax=-32768:32767;

semilogy(xax,his);
axis([-40000 40000 1e-10 1e-1])


Heck, do you even have to do that, then? (goes to read that manual page. I know there was some reason I did not use the matlab function when I did this before...)

Hmm.  It's a bit funky but ought to be workable. Working.

Ok the following gets apparently identical results, and takes 10 seconds for a 7 minute song.

I declare histc a winner.

Code: [Select]
clear all
close all
clc

fname='13.wav'
x=wavread(fname);
x=round(x*32768);

len=length(x)

his=histc(x(:,1),-32768:32767); % channel 1
his=his+histc(x(:,2),-32768:32767); %channel 2

tot=sum(his);
his=his/tot;
his=max(his, .000000000001);
xax=-32768:32767;

semilogy(xax,his);
axis([-40000 40000 1e-10 1e-1]);
big=max(max(x))
small=min(min(x))

fname
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #61
 yeah, double transposing was useless
at the beginning I thought that various hist() functions probably aren't good idea when you didn't used them in the first place, and my experience with statistics is bare basic - I almost failed on the exam last year

Interesting Histograms.

Reply #62
yeah, double transposing was useless
at the beginning I thought that various hist() functions probably aren't good idea when you didn't used them in the first place, and my experience with statistics is bare basic - I almost failed on the exam last year


I used what worked in matlab best, at least for the versions I've timed out. So much for that idea
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #63
Quote
I've found an album that a fair number of missing codes.
I've also found an album that has a max under +-16384 and has over 75% missing codes inside of that.
I didn't originally set up to calculate ratio of missing codes to all codes. But I think I shall have to.

Hi all,
back to this old but interesting topic, I did same kind of histogram as JJ and I also found lots of missing codes in 16 bits analysis of many tracks of various musics. I'm wondering where this comes from.
Because even if the music is produced in 14 or 15 bits (?), some dither is generally used and should more or less fill missing codes, no ?

Thanks
JLO

Interesting Histograms.

Reply #64
Quote
I've found an album that a fair number of missing codes.
I've also found an album that has a max under +-16384 and has over 75% missing codes inside of that.
I didn't originally set up to calculate ratio of missing codes to all codes. But I think I shall have to.

Hi all,
back to this old but interesting topic, I did same kind of histogram as JJ and I also found lots of missing codes in 16 bits analysis of many tracks of various musics. I'm wondering where this comes from.
Because even if the music is produced in 14 or 15 bits (?), some dither is generally used and should more or less fill missing codes, no ?

Thanks
JLO


You would hope so.  But as you've noticed, there does seem to be a lot of "interesting" processing going on.
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #65
Quote
I've found an album that a fair number of missing codes.
I've also found an album that has a max under +-16384 and has over 75% missing codes inside of that.
I didn't originally set up to calculate ratio of missing codes to all codes. But I think I shall have to.

Hi all,
back to this old but interesting topic, I did same kind of histogram as JJ and I also found lots of missing codes in 16 bits analysis of many tracks of various musics. I'm wondering where this comes from.
Because even if the music is produced in 14 or 15 bits (?), some dither is generally used and should more or less fill missing codes, no ?

Thanks
JLO


You would hope so.  But as you've noticed, there does seem to be a lot of "interesting" processing going on.


What do you think is the cause of this, JJ?

Is this a consequence of perceptual coding?

Interesting Histograms.

Reply #66
Quote
Is this a consequence of perceptual coding?
I've seen missing codes on tracks  that stayed in the pcm world and that have never been perceptually encoded.

Interesting Histograms.

Reply #67
Quote
I've found an album that a fair number of missing codes.
I've also found an album that has a max under +-16384 and has over 75% missing codes inside of that.
I didn't originally set up to calculate ratio of missing codes to all codes. But I think I shall have to.

Hi all,
back to this old but interesting topic, I did same kind of histogram as JJ and I also found lots of missing codes in 16 bits analysis of many tracks of various musics. I'm wondering where this comes from.
Because even if the music is produced in 14 or 15 bits (?), some dither is generally used and should more or less fill missing codes, no ?

Thanks
JLO


You would hope so.  But as you've noticed, there does seem to be a lot of "interesting" processing going on.


What do you think is the cause of this, JJ?

Is this a consequence of perceptual coding?


That's the least likely source. The way any kind of lossless coding (well beyond truncation or reduction in bit length of the signal directly) works would not cause missing codes.

Missing codes are more likely from lack of dithering, bad ADC, or bad arithmetic in a processor. Alternatively, not enough bits in the processor.
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #68
To be sure of my histograms, I did some tests generating and analysing noise files :

- white noise generated by wavelab in 44.1kHz, 16 bits
- reduction of number of bits by distorder, recorded back in 44/16
- dithered by wavelab, with internal dither, noise type 1, noise shaping 3, output in 44/16
- histogram analysis in 16 bits and missing codes counted in percentage of actual used range

wav 16 bits : 0% missing codes
wav 15 bits : 50% missing codes
wav 14 bits : 75% missing codes
wav 13 bits : 87% missing codes
wav 13 bits dithered : 0% missing codes
wav 12 bits : 94% missing codes
wav 12 bits dithered : 0% missing codes
wav 11 bits : 97% missing codes
wav 11 bits dithered : 0% missing codes
wav 10 bits : 99% missing codes
wav 10 bits dithered : 33% missing codes

All values seem in conformity with theorie.
Here are the histogram for 10bits also with detailed picture : blue is undithered, red is dithered.
We see that dither is really helpfull in filling missing codes !
Would now be interesting to analyse all steps of a real production, from recording to final pressed CD.


Interesting Histograms.

Reply #69
wav 16 bits : 0% missing codes
wav 13 bits dithered : 0% missing codes
wav 12 bits dithered : 0% missing codes
wav 11 bits dithered : 0% missing codes
wav 10 bits dithered : 33% missing codes
Isn't it strange that 10 bits dithered has 33% missing codes ? Were the dither settings identical to the other versions ?

Interesting Histograms.

Reply #70
Isn't it strange that 10 bits dithered has 33% missing codes ? Were the dither settings identical to the other versions ?

The dither was the same for all files. You can see in the right lower picture that the dither only fills about 2/3 of the all codes. Because the dither amplitude is not high enough to fill missing codes from a 10bits files but is enough to fill a 11bit file.

Interesting Histograms.

Reply #71
Isn't it strange that 10 bits dithered has 33% missing codes ? Were the dither settings identical to the other versions ?

The dither was the same for all files. You can see in the right lower picture that the dither only fills about 2/3 of the all codes. Because the dither amplitude is not high enough to fill missing codes from a 10bits files but is enough to fill a 11bit file.


Dither should be set relative to the quantization level.
-----
J. D. (jj) Johnston

Interesting Histograms.

Reply #72
Dither should be set relative to the quantization level.

If you were decreasing the number of bits then the dither should be determined by the destination bits, but if you are increasing the resolution then dither should not be applied at all, should it? In this case, why not set the dither based on the source?

Interesting Histograms.

Reply #73
Quote
Dither should be set relative to the quantization level.
Above tests are just for demo purposes. But it also shows that dither amplitude and type should be set depending on final quantization but also depending on input signal quantization.
In the above tests, it shows that this particular Wavelab dither is fine for 11bits>16bits dithering but would have unnecessarily high amplitude for a 15 bits>16bits requantization.

Interesting Histograms.

Reply #74
Quote
Dither should be set relative to the quantization level.
Above tests are just for demo purposes. But it also shows that dither amplitude and type should be set depending on final quantization but also depending on input signal quantization.
In the above tests, it shows that this particular Wavelab dither is fine for 11bits>16bits dithering but would have unnecessarily high amplitude for a 15 bits>16bits requantization.


Hmm, that would suggest that the input noise floor is not always being accurately reproduced. It is arguable if that is audible, of course.
-----
J. D. (jj) Johnston