IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Bandpass RMS for audio content matching
klonuo
post Dec 1 2010, 00:47
Post #1





Group: Members
Posts: 258
Joined: 29-April 10
Member No.: 80274



Let's say we divide╣ frequency range to octaves (middle 9 from 11) and then bandpass ranges and calculate RMS per range
Then doing some kind of normalization (maybe using ReplayGain values) and store this values

Don't know if it can be done in one-pass, so maybe it would be computationally hungry, not optimized etc, but don't you think it would be good pool for matching track similarity on various bases through this normalized bandpass-ranged RMS values?

------------------
╣ ranges can be selected differently of course, not sure what would be optimal here
Go to the top of the page
+Quote Post
JapanAudio
post Dec 1 2010, 01:14
Post #2





Group: Members
Posts: 86
Joined: 3-November 10
Member No.: 85187



What type of content matching? do you mean like Shazam: http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

They do fingerprint analysis with "constellation maps".
Go to the top of the page
+Quote Post
klonuo
post Dec 1 2010, 01:20
Post #3





Group: Members
Posts: 258
Joined: 29-April 10
Member No.: 80274



Yeah I know, musical islands and heavy statistics over my head
I was just thinking simple and this does not seemed like a bad idea so I posted
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Dec 1 2010, 01:25
Post #4





Group: Members
Posts: 3536
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (klonuo @ Nov 30 2010, 18:47) *
Let's say we divide╣ frequency range to octaves (middle 9 from 11) and then bandpass ranges and calculate RMS per range
Then doing some kind of normalization (maybe using ReplayGain values) and store this values

Don't know if it can be done in one-pass, so maybe it would be computationally hungry, not optimized etc, but don't you think it would be good pool for matching track similarity on various bases through this normalized bandpass-ranged RMS values?

------------------
╣ ranges can be selected differently of course, not sure what would be optimal here


Something like this has long been done to increase the apparent loudness of recordings. It is called multi-band compression.
Go to the top of the page
+Quote Post
JapanAudio
post Dec 1 2010, 01:40
Post #5





Group: Members
Posts: 86
Joined: 3-November 10
Member No.: 85187



QUOTE (klonuo @ Nov 30 2010, 19:20) *
Yeah I know, musical islands and heavy statistics over my head
I was just thinking simple and this does not seemed like a bad idea so I posted

It's not a bad idea but if you're gonna do content matching you will have to consider the probabilities of a match at some point...
Go to the top of the page
+Quote Post
klonuo
post Dec 1 2010, 02:00
Post #6





Group: Members
Posts: 258
Joined: 29-April 10
Member No.: 80274



Post title isn't good maybe
I was listening to Gobi by Monolake and thought to continue in that direction then my laziness thought about similarity
Strictly speaking "content matching" was not in my mind, but it may be interesting too
Go to the top of the page
+Quote Post
klonuo
post Dec 1 2010, 02:52
Post #7





Group: Members
Posts: 258
Joined: 29-April 10
Member No.: 80274



QUOTE (JapanAudio @ Dec 1 2010, 01:40) *
you will have to consider the probabilities of a match at some point...

"Matching" can be implemented as one of hypothesis testing comparisons with custom error, (although small number of data to consider it as normal distribution, but IIRC hypothesis testing worked even on less then 10 samples) which is easy and cheap
Maybe even some predefined curves can be estimated to music styles, but that may be too ambitious

I think I'll try this (one of this days)

[edit] correlation analysis should be even easier and more sane instead my nonsensical hypothesis testing pinch.gif

This post has been edited by klonuo: Dec 1 2010, 03:05
Go to the top of the page
+Quote Post
klonuo
post Dec 3 2010, 07:50
Post #8





Group: Members
Posts: 258
Joined: 29-April 10
Member No.: 80274



There were no negative replies so I started roughly to see what could I see there (being almost DSP illiterate)
I choose to start with sox, as easiest way to me. Here is commented batch file I used: http://pastebin.com/n74i2BZh

As commented inside script, it outputs .stats file and optional gnuplot script for visualizing data which outputs two pictures: filename_RMS.png and filename_Crest.png:



.stats file looking like this (space delimitered columns):
CODE
Band_# RMS_lev Left Right RMS_pk Left Right Pk_lev Left Right CFLeft CFRight
01_band= -55.23 -55.25 -55.22 -39.76 -39.76 -41.05 -32.26 -32.26 -33.41 14.11 12.32
02_band= -49.25 -49.26 -49.23 -33.79 -33.79 -35.06 -26.31 -26.31 -27.45 14.04 12.27
03_band= -43.35 -43.34 -43.35 -27.98 -27.98 -29.18 -20.56 -20.56 -21.71 13.77 12.08
04_band= -37.63 -37.55 -37.70 -22.66 -22.66 -23.61 -15.29 -15.29 -16.69 12.98 11.24
05_band= -32.36 -32.01 -32.75 -19.63 -19.63 -20.19 -11.56 -11.56 -12.28 10.53 10.56
06_band= -28.40 -27.85 -29.03 -12.92 -12.92 -17.30 -6.85 -6.85 -9.31 11.22 9.68
07_band= -29.98 -29.67 -30.31 -18.55 -18.86 -18.55 -8.27 -8.27 -8.36 11.75 12.52
08_band= -33.83 -33.34 -34.38 -20.47 -21.81 -20.47 -9.66 -9.94 -9.66 14.79 17.21
09_band= -36.62 -36.47 -36.76 -19.50 -19.50 -19.67 -8.88 -10.35 -8.88 20.23 24.77
10_band= -44.56 -45.65 -43.70 -24.12 -26.54 -24.12 -12.68 -14.36 -12.68 36.71 35.55
11_band= -57.21 -57.90 -56.62 -38.47 -39.81 -38.47 -22.43 -25.76 -22.43 40.49 51.22

I tried all this to be as obvious as possible and to work as expected

Here is some cross-referenced tables with correlation coefficients for normalized RMS average per full 11 band octaves:

"Hysteria" by Def Leppard vs itself:


Table could suggest that track 4 (ballad) and 13 (live track) are worst match
Tracks that match most are track 2 and 3 and 11

So this looks like expected analysing same release

Some low coefficients between "Hysteria" and Ornette Coleman's "Shape of Jazz to Come":


Interesting match in last tracks from Gustav Holst's "Planets" and "Shape of Jazz to Come":


"The Planets" self-reference table:


I hope this isn't "see what you want to see", as I don't have much data right now as process is slow, but I wanted to post and maybe get some tips. I plan to try different bands, maybe other approaches and similar, i.e. Arnold's comment about multi-band compressor: It is possible with sox to do separate bands in one pass with cross-over filters like "mcompand" does, but I don't think (or don't know how) it can be used here?
Go to the top of the page
+Quote Post
klonuo
post Dec 9 2010, 21:53
Post #9





Group: Members
Posts: 258
Joined: 29-April 10
Member No.: 80274



saga continues...

I sort of, found one pass solution by using Bidule and making bandpass filters from Christian Budde's Chebyshev LP/HP VST filters and couple of Destroy FX open sourced RMS buddies:


Making steep bandpass like sox's sinc filter was a bit of challenge to me (4-6 KHz):


current solution (Chebyshev 16th ord VST):


It's not faster then sox (although it's done in one pass) but almost no processor usage and in off-line mode faster then real-time (playing the track).

This kind of usage shows interesting idea: possibility of VST or DSP effect that would calculate this values on the go (while listening to track and accumulating average RMS) and then on chosen intervals, lets say 50%, 75% and/or at track end, present similar tracks based on calculated correlation coefficients with values already stored in database. Processing would be neglectful this way. And of course if assumptions about making similarity based on RMS bands is valid (perhaps some new variable should be introduced also)

The day of my previous post I tried with 6 bands:
CODE
#     LP    HP
1:    16    60
2:    60   250
3:   250  2000
4:  2000  4000
5:  4000  6000
6:  6000 16000


Few results I gathered were similar (matching) with previous 11 bands data. Problem I run with 6 sample points is that only linear correlation coeff. are meaningful, while doing non-linear (as data is) with Spearman or Kendall (tau), results are not as expected. But looking for linear correlation seems the right way while looking at RMS bands on referenced impulse:



I'll make pause here, and back again if there is at least minor interest other then mine wink.gif
Go to the top of the page
+Quote Post
romor
post Feb 8 2012, 14:58
Post #10





Group: Members
Posts: 650
Joined: 16-January 09
Member No.: 65630



Apart from some talk, to me (DSP noob) this seems like fine approach

There is lot of information out there, and in this case I don't think it's hidden as with some physics aspect, but real practice (after necessary background) is needed to grasp meaning of some.
Having in mind recent JJ FFT talks, I don't really think it's possible to present the meaning of FFT (or anything DSP related) to someone that that doesn't understand basic mathematical physics concepts (as integral transform i.e.). There is no "need" and there is no sense, it's like pumping botox.

But this seems like nice and easily comprehensive and intuitive approach. Find right bins (from critical bands, like http://www.independentrecording.net/irn/re...in_display.htm), then get energy, and output numbers. Correlation magic could lead to very interesting things I think

This post has been edited by romor: Feb 8 2012, 15:00


--------------------
scripts: http://goo.gl/M1qVLQ
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 19th April 2014 - 05:32