Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Using LAME to detect mp3-sourced wavs? (Read 9111 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Using LAME to detect mp3-sourced wavs?

Alright, folks. I am trying to re-invent the wheel here, or the bicycle.
I searched but couldn't really find anything directly related. If something similar has been discussed already, I have missed it.

The idea started from this thread. I was so puzzled by how much lame VBR bitrate may jump if only a few null-samples are added to the wav files. So I started to experiment a little.

It turns out that in certain situations this behaviour may be useful to detect if a given wav file has been uncompressed from mp3.
I know we have Tau Analyzer,aucdtect and all that, but this here looks rather intriguing.

[span style='font-size:8pt;line-height:100%']
Here is how I did it to produce the graphs:
* Take a 10 seconds wav clip from original CDDA.
* Encode it into several mp3s using diffent settings (128kbps, APS, 320kbps API).
* Uncompress these mp3s back to wavs, so we have original wav clip and mp3-sourced wavs
(We are going to try to determine which is which.)

* Encode every wav again to mp3 (at this stage I encoded all with "-V2 --vbr-new")

Now here is the trick: try different sample offsets!

* Encode not only the wav as it is, but also encode the wavs which are produced from the initial wav by some sample offset.
* Here I did it very simply: took the wav and appended a number of null-samples in the beginning, so I get a new wav with offset.
  (obviously, these wavs with appended offset will be a little longer than initial wav, but I hope it doesn't influence the results)

* for every mp3 encoded from these "wavs with offset" - take its average bitrate

* Now plot the graphs: average_bitrate vs wav_sample_offset
In my case, there will be four graphs: one for the original cdda clip, one for the 128kbps-decoded-wav, one for APS-decoded and one for 320kbps-decoded.
[/span]

And this is what it looks like (here is the same figure in higher resolution):



Curiously, the bitrate of VBR mp3 re-encoded from an mp3 may drop significantly in case of zero offset or if offset is a multiple of (1/2)*576.
IIRC 576 is related to mp3 frame size.

Here are the links to the zoomed parts of the above graph (all png in hi-res, sorry):
around N = 0, zero offset
around N = 0.5 x 576 sample offset
N = 1 x 576
N = 1 x 576 more zoom
N = 1.5 x 576
N = 2 x 576
N = 2.5 x 576
N = 3 x 576

The "x576" bitrate drops are particulary obvious for the 128kbps-mp3 source.
They are quite pronouneced also for the APS-source (especially N = 0, 1x576, 2x576 etc).

The bitrates produced from the 320kbps-API-source are not so different from the original-cdda-source, so it may be impossible to tell these two apart (in this case). Though I would say there are more of zig-zags in the 320kbps-curves around the x576-offsets, so maybe one could derive a measure of such "zig-zaginess".
As a side note, in that other thread there was 320kbps mp3 too and there the bitrate drop was huge.



Also I checked what aucdtect says about it. Aucdtct guessed the above clip correctly (even the 320kbps mp3 too).

However, I didn't look hard at all to find another sample where Aucdtect was wrong: an old 128kbps mp3 - aucdtct thinks it is 100% cdda.
Here is the part of the graph for this sample - obvious bitrate drops, the "x576 effect".



So, in conclusion:

The idea to detect mp3-decoded wavs using LAME encoder is wild and crazy, but it may actually work. It did at least for some of those clips which I tried. Of course, good understanding of the effect and its limitations is neccessary before any positive conclusion could be made.

This "x576" effect of the bitrate drop is quite curious, I think.
576 is related to the frame size, So perhaps it may be understandable. Someone with the deeper knowledge of mp3 care to try digging up an explanation to all this?
Maybe it has somethgin to do with the "blockyness" of the quantized mp3 spectra?

Also I wonder is it somehow maybe indirectly used in the algorithm of aucdtect.

All in all, more questions than answers.

EDIT: Almost forgot very important thing: I didn't assess the quality of any of these mp3s by listening tests. One can assume that because the mp3 are encoded using the same VBR preset the resulting mp3s are of "equal" quality, even though the average bitrate for the same music material can rise or drop due to varying sample offset.

Using LAME to detect mp3-sourced wavs?

Reply #1
576 is half of 1152, the MP3-frame size.

If it's LAME we're talking about, I think it would me easier to check the length of the song in samples using foobar2000, and divide by 588.

If the result is an integer, it's probably safe. If the result is a floating-point number, it's definitely unsafe.

Using LAME to detect mp3-sourced wavs?

Reply #2
Quote
If it's LAME we're talking about, I think it would me easier to check the length of the song in samples using foobar2000, and divide by 588.

If the result is an integer, it's probably safe. If the result is a floating-point number, it's definitely unsafe.

Yes, you are right: if wav is not multiple of 588 it is very suspicious to say the least.

But in many situations wav is multiple of 588 also when it was mp3. that is why programs like aucdtect are developed.

An mp3 burned to audio-cd and ripped to wav will be multiple of 588. Wav decoded from lame mp3 by foobar will have the original length which is typically cdda, so again it will be multiple of 588.

Also there may be some lossless files which were never on cdda, e.g. some recorded bootlegs etc.

Using LAME to detect mp3-sourced wavs?

Reply #3
Here is the effect of [span style='font-size:11pt;line-height:100%']dither[/span] on the above bitrate-offset graphs. Applied "strong ATH noise shaping" dither when decoding mp3s by foobar.

Dithered 128 kbps mp3 in comparison to undithered: the characterstic bitrate drops are still there.



In the case of dithered APS mp3 the pictures gets worse, but still the bitrate drops are visible, though not so apparent.




With 320 kbps, the picture is very unclear already for undithered mp3, and it remains so for the dithered one.
However, it is still possible to detect the characteristic 576 samples periods also in this diffcult case. See my next post.

Using LAME to detect mp3-sourced wavs?

Reply #4
I suspected that mp3 curves had somewhat wilder "zig-zag" character than the cdda curve. In this example it was very easy to demonstrate it by simply plotting the bitrate fluctuation instead of bitrates themselves, i.e. simple diff(N) = V(N+1)-V(N). The mp3 curves clearly have 576-periodical jumps.
Even dither doesn't flatten out these jumps.

Here is a graph for 320 kbps mp3 with strong dither (click for larger image):

The frequency spectrum of this mp3 looks very similar to the cdda (i.e. when just looking at spectral view in a wav-editor). High lowpass blends into the noise-shaped dither and spectrum looks continuous as in cdda.

By the way. It seems to me that strong dither always(or often?) tricks aucdtect to mistake mp3 for cdda.

In any case, it is definitely too early to generalize or claim any superiority of these bitrate-offset graphs. More representable selection of music clips needs to be analyzed. But the result is rather encouraging.

[span style='font-size:12pt;line-height:100%']Basically, I think it should be possible to derive some kind of measure, perhaps operating in freq domain, which would exhibit periodicity corresponding to the mp3 frame size.
Here I am simply using the average bitrate of re-encoded mp3s as a measure. Maybe it is possible to pin-point something in the LAME psymodel which gives greatest contribution to the effect.
[/span]
(I have no idea what this might be, or how simple or hard it may be to derive such measure.
Could be something related to the way the transform windows overlap. And maybe LAME psymodel thinks it is less noisy when the new transforms are aligned with the old - hence periodic dependence on the offset.
Just speculating...)


----------------------------------------------------------
EDIT

NOTE: I found a stupid small bug: on the above graph for the dithered 320 kbps mp3 one point is missing: for the offset = -1. It is the "zero" point of the lower 576-period on the graph. Without it, the period would break. However, result and conclusion are the same, so it is not important. Maybe I will change the graph later.

NOTE # 2: another bug, totally insignificant for the results, but it was stupid: I didn't compensate for the offset when calculating average bitrate, so the bitrate values were skewed, increasing a little with increasing offset (as the file size becomes a little bigger). Now the correct bitrates are actually skewed the other way - they decrease with increasing offset, because offset is silence and encoded with very low bitrate. It doesn't change anything significant on the graphs. Bitrates in all the new graphs below are compensated for the offset.

Using LAME to detect mp3-sourced wavs?

Reply #5
The idea is clever, but not new. FhG has some papers out about reverse encoders that work similarly (but smarter).

You ideally want to run the filterbank in reverse on the output, and look for zero-ed regions in the frequency domain. The offset which has the most/longest is the one corresponding to the encoding/decoding offset.

You can go much further than that actually, and reconstruct the original mp3 parameters to a large extent.

Using LAME to detect mp3-sourced wavs?

Reply #6
As pointed by Garf, what you are noticing is used in the "reverse decoder" paper from FhG.
I am wondering what you will still find without reading this paper...

Using LAME to detect mp3-sourced wavs?

Reply #7
Of course, inverse decoder... I totally forgot about it.
So it is indeed possible to do it with a smarter, proper method. I knew I was on my way to re-invent something.

I'm guessing this inverse decoder is not available anywhere as a ready-to-use tool, free to download?


Oh, well...

In the meantime, I tested a few more things with this bitrate-offset approach. (Yes, feels like I made myself a "new toy" - have to play around with it a little before I get bored and throw it away...)

NOTE: everywhere below I used foobar's strong ATH noise shaping dither when decoding, so I don't mention it anymore.

Using LAME to detect mp3-sourced wavs?

Reply #8
[span style='font-size:10pt;line-height:100%']Detecting uncompressed Musepack[/span] files.

It is already not surprising that the best MPC detection tool in my possession would be Musepack encoder.
However, what is surprising is that it is so easy to detect even braindead mpc!
Here is the graph:

Very characteristic peaks with the period = 32 samples. The lower bitrate mpc (q=3 and q=5) even have two series of peaks, "ups" and "downs" shifted by 16 samples relative to each other.
The brainded mpc (q=8) only shows "ups", but these are huge! Very different from 320 kbps mp3.
(Edit: ah, well, need to remember that 320-mp3 were in fact 320kbps, but braindead mpc on this clip is only about 260kbps)

Next, musepack encoder cannot detect uncompressed mp3s. The graphs look similar to cdda no matter what bitrate mp3.

Also tried the other way around: to detect mpc using lame encoder. Not possible, except for the very low bitrate mpc (q=3) where the mp3 bitrate graph is apparently modulated by the 32-periodic strong "shakes".

NOTE: Here we have mp3 period of 576 samples aligned with the mpc periods of 16 sapmples (576=16*36), but it doesn't help much, probably because the filters and transforms are different.

I don't show any graphs here because they are not so interesting in this case.
The next part is more interesting.

Using LAME to detect mp3-sourced wavs?

Reply #9
Trying to detect [span style='font-size:10pt;line-height:100%']mp3-mpc transcoding[/span]

Here we have a choice - detect using mppenc or lame.


Well, mppenc seems to be unsuitable here, probably because he cannot detect simple mp3s, so transcoded mp3s are even harder for him.

One strange thing here is the bitrate graph for the 128 kbps mp3 transcoded to braindead q=8 mpc. There is some anomaly there, as if the graph (red color) breaks down in the middle and consist of two parts. Unexpected.



So, the better way is to detect mp3-mpc transcodes using mp3 encoder.

How easy or how diffucult it is to detect mp3-transcode from the graph - it depends on the quality of both mp3 and mpc. The better quality mp3 and the less quality mpc -- the harder to detect, as one would expected.
Here, for example, two rather extreme cases:

1) 128 kbps mp3 transcoded to braindead q8-mpc - Graph has huge bitrate drops very similar to the case of 128 kbps without mpc, i.e. q8-mpc adds little noise and 128-mp3 has strong "built-in information".

2) 320 kbps mp3 transcoded to standard q5-mpc - looks like impossible to detect mp3 source. Perhaps the noise introduced by q5-mpc is stronger than "built-in information" from 320-mp3.

The following combination is somewhere in between:

320 kbps mp3 transcoded to braindead q8-mpc (click for larger image).
Here the bitrate fluctuation is defined as scaled_diff(N) = scale*( abs(V(N+1)-V(N)) + abs(V(N)-V(N-1))).

mp3 clearly adds new peaks. But "clearly" only in direct comparison with original cdda or with q8-mpc, but if we don't know the orginial, then it is not so clear how to define the threshold.
Compare with q8-mpc - there are already some periodic peaks due to detecting mpc by mp3 (detecting the same mpc by mppenc produces different location of periodic peaks, so it really depends on the detector-encoder too).
And original cdda may have weak periodic peaks as well.
So "weakness" of the peaks is relative. In this example it is actually not "clear" mp3, but rather "suspecious" or "possible".



Ah, if only we had inverse decoders...
One could detect mpc first, then apply "inverse mpc decoder", and then try detect mp3 with inverse mp3 decoder...

(Edit: why do I spell "brainded" all the time...)

Using LAME to detect mp3-sourced wavs?

Reply #10
This is really fascinating;  I'm looking forward to the rest.

Using LAME to detect mp3-sourced wavs?

Reply #11
Quote
This is really fascinating;  I'm looking forward to the rest.

Thanks! 

Here we go...
Detection of [span style='font-size:10pt;line-height:100%']mp3-mpc transcoding[/span], part 2.

To complete this topic I produced a few more graphs.

Again, the bitrate fluctuation is defined as scaled_diff(N) = scale*( abs(V(N+1)-V(N)) + abs(V(N)-V(N-1))).
This choice seems to be the most appropriate when detecting mp3. With mp3 bitrates, this formula captures the typical zig-zag bitrate jumps and produces a nice apparent peaks on the mp3 frame boundaries.

Here is another illustration of using this fluctuation formula:

First, the normal bitrate-offset graph for the quality-5 MPC transcoded from various mp3s.

One might argue that yeah, there are peaks for 128 mp3, but for APS-mp3 the peask are not so huge, maybe it's not so apparent blah-blah...

But look at the fluctuation graph (click for high-res png):

The peaks are now totally apparent also for the APS-mp3.
(The 320kbps mp3 was not possible to detect here, so it is not shown).

So, an intersting and maybe useful observation:
[span style='font-size:10pt;line-height:100%']It is no problem to detect quality 5 mpc transcoded from Alt-Preset-Standard mp3[/span].
(Well, at least with this one music clip which I tried here. More clips need to be tested.)

As I mentioned before, it would be even easier to detect the same quality of mp3 in higher quality mpc, or lower quality mp3 in the same mpc.
Together with the graphs from my previous posts, we can have some idea of what may be possible to detect.



Here I tested a few issues with mp3-mpc transcoding: [span style='font-size:10pt;line-height:100%']Influence of dither and mp3-mpc frame alignment[/span].
The case of braindead q8-MPC transcoded from 320 kbps mp3.(click for high res png)


I should explain:

1. By "dither" here I refer to the dither of the MPC file only. (All source input mp3s here were dithered upon decoding).
As expected, dither (strong ATH noise-shaping) of MPC files makes it harder to detect mp3-transcoding, as the peaks become less apparent. (Notice the red and blue peaks overlap)

2. mp3-mpc frame alignment issue.

Notice that in my previous graphs the LAME mp3s were decoded with the correct encoder-decoder delay which produces the uncompressed wave aligned with the original cdda wav. So when MPC are encoded they are aligned with cdda, but also they are aligned with the mp3. The MPC frame starts exactly in the position of the first mp3 frame.

I wanted to test what happens when MPC frame is not aligned with the mp3 frames. So I took the decoded mp3 and cut off 47 samples in the beginning (producing offset = -47). Thus the fisrt MPC frame is shifted by 47 samples relative to the mp3 frame.
The number 47 = 1152-1105 i.e. it is the offset you would get when decoding without compensation for encoder-decoder delay.

As one can see from the graph, the mp3 which is not aligned with mpc (green color) may be easier to detect - the peaks are stronger.

Using LAME to detect mp3-sourced wavs?

Reply #12
I think the end of this saga is very near. Just one Ogg-graph today, and there will probably be a couple AAC-graphs tomorrow, and that's it - these will be the last.
(That is unless someone wants me to test something specific.)

We already saw that mp3-graphs and musepack-graphs are quite different in character. I am now testing Ogg and AAC, and it appears that these graphs have very distinctive shape for each format.


Anyway...Detection of [span style='font-size:11pt;line-height:100%']Ogg-Vorbis framing[/span] in uncompressed waves

Here is the graph for the Vorbis file of quality = 9 (click for hi-res png):

Bitrate drops with the period of 128 samples - very strong drops even for such high bitrate Vorbis source.
The curves for lower bitrate Oggs (q=6 and q=4) are not shown on the figure - they are very similar and have much stronger peaks.

To be honeset, I am quite surprised by this result. It is not what I expected to see.
[span style='font-size:11pt;line-height:100%']QUESTION: Why does the graph only have very strong 128-period and no other larger period?
[/span]
(see wide offset range of the same graph)

With mp3 we had 576 periodic drops which fits nicely with mp3 frame boundaries (granule boundaries, to be more exact). mp3 granule may contain either 1 long block or 3 short blocks. The fact that only 576-periods are visible suggests that perhaps long block frames are responsible for these periods.
(In the mp3s of this particular music clip there were mostly long block frames and a few short blocks.)

With Musepack the picture was more complex: the graphs had a weak 1052-period corresponding to the mpc frame size, and strong 32-samples periods (maybe due to the specific subband filter-bank of musepack?)
In any case, both mp3 and musepack graphs are "easy" to understand.

Ogg-Vorbis (if I read documentation correctly) can have frames of different size, each frame can contain only one block either short or long. Size of the blocks is fixed between 64 and 8192 samples.
Quote
Vorbis frames may be one of two PCM sample sizes specified during codec setup. In Vorbis I, legal frame sizes are powers of two from 64 to 8192 samples.
The above Ogg-graph only have 128-periods, which I'm guessing is the size of short blocks.
I would expect two possible scenarios:
  • if distribution and sequence of short/long blocks in the original Ogg and re-encoded Ogg are different, then there should be no strong correlation, no bitrate drops.
  • if sequence of short/long blocks is the same or if short blocks are grouped to match the size of long block (like in mp3), then there should be periods by long block, similar to mp3.
I must misunderstand something here...
[span style='font-size:11pt;line-height:100%']QUESTION: how does Vorbis actually work? why the graph shows only 128-periods?[/span]

--------------------

The results I'm getting for AAC is also quite exciting! I'll post them all tomorrow.
[span style='font-size:11pt;line-height:100%']Anyone dare to predict the shape and characteristic features of AAC-graphs?[/span]

Using LAME to detect mp3-sourced wavs?

Reply #13
Quote
[span style='font-size:11pt;line-height:100%']QUESTION: how does Vorbis actually work? why the graph shows only 128-periods?[/span]


I think (to be verified) that current usual Vorbis encoders are using a long block size of 2048 samples dow to q 0. For negative q values 4096 samples are used.

Using LAME to detect mp3-sourced wavs?

Reply #14
Quote
I think (to be verified) that current usual Vorbis encoders are using a long block size of 2048 samples down to q 0.

Oh, so if 2048 is indeed the block size (not the window size?), then I should test a range of offsets two times larger than the one I tried. Perhaps something might become visible.

Still it feels strange that the short 128-periods are so strong. So different from mp3. Maybe in the real explantation it's not directly related to the short block size, but at first sight it seems as if it has something to do with it.

Here is the mp3 graph for comparison:
I encoded the clip with LAME --allshort option, and also in re-encoding "--alshort" was used too.

Interesting thing is that the mp3 bitrate here doesn't drop down, it jumps up.
That is, the effect is opposite in comparison with those mp3s which have mostly long blocks.

On the Ogg-graph the bitrate only drops, doesn't jump. On the Musepack graphs we had both drops and jumps at the same time. Well, probably explanation is really more complicated than just simple short or long block correlation.

Using LAME to detect mp3-sourced wavs?

Reply #15
Now to the promised [span style='font-size:11pt;line-height:100%']detection of AAC framing [/span] in uncompressed wavs.

I encoded some AACs with PsyTEL encoder and I am using latest FAAC VBR "-q 200" for detection. Don't ask me why...


Clear 1024-periodic bitrate drops for low and medium bitrate AACs (streaming and extreme). With high bitrate "ultra" AAC the bitrate jumps, not drops, and it is harder to see from this graph.

But what the hell are these several sudden changes in the bitrate level on the CDDA curve? and the AAC curves more or less follow these sudden changes. As if the curve suddenly jumps from one level to the other, forth and back.
(Actually it looks similar to one graph which we have seen before: 128 kbps mp3 transcoded to brainded q=8 mpc.)
Totally unexpected. And exciting. I have no idea what is going on here.

Here are some zoomed-in parts of the figure:
sudden changes of bitrate level
N = 1024
N = 512
N = 256


These sudden changes of level are responsible for what I called "false peaks" on the fluctuation graph.

The "streaming" and "extreme" AACs are easy to identify by their very strong 1024-periodic peaks and also weaker 256-periodic peaks.


Here is the fluctuation graph for the high-bitrate "ultra" AAC:

The curve doesn't have any 256- or 512-periodic peaks, but it has good visible 1024-peaks. It would have been possible to identify ultra AAC here, but the "false peaks" ruin the picture - they are as strong as the "true" 1024-peaks, and it would be impossible to tell them apart if we didn't know which is which.

Using LAME to detect mp3-sourced wavs?

Reply #16
Finally, some graphs concerning [span style='font-size:11pt;line-height:100%']detection of mp3-transcoding in AAC files[/span].

I transcoded some mp3s to PsyTEL "streaming" AAC (a "128kpbs-ish" preset):

The mp3 signature of "preset standard" is too weak and is lost after transcoding, impossible to identify.
On the other hand, mp3 128 kbps still produces nice 576-periodic peaks.
So it should be [span style='font-size:10pt;line-height:100%']possible to identify 128 kbps mp3 transcoded to 128 kbps AAC[/span].

That's all.
[span style='font-size:21pt;line-height:100%']The END[/span]


To make further progress I should better try to underdstand the working of the inverse decoder. This surely will take quite a while...

Using LAME to detect mp3-sourced wavs?

Reply #17
"However, I didn't look hard at all to find another sample where Aucdtect was wrong: an old 128kbps mp3 - aucdtct thinks it is 100% cdda."


D:\2>auCDtect.exe -ms0 golosa_mp3.wav
auCDtect: CD records authenticity detector, version 0.8.2
Copyright © 2004 Oleg Berngardt. All rights reserved.
Copyright © 2004 Alexander Djourik. All rights reserved.
------------------------------------------------------------
Processing file:        [golosa_mp3.wav]
Data analysis:          [100%]
------------------------------------------------------------
This track looks like MPEG with probability 92%


Using LAME to detect mp3-sourced wavs?

Reply #18
"I encoded some AACs with PsyTEL encoder and I am using latest FAAC VBR "-q 200" for detection. Don't ask me why..."

hmmm the new aucdtect failed...
C:\Converted Music>auCDtect.exe -ms0 "test_.wav"
auCDtect: CD records authenticity detector, version 0.8.2
Copyright © 2004 Oleg Berngardt. All rights reserved.
Copyright © 2004 Alexander Djourik. All rights reserved.
------------------------------------------------------------
Processing file:        [test_.wav]
Data analysis:          [100%]
------------------------------------------------------------
This track looks like CDDA with probability 100%

as i show you above..."from mp3" detect improved... but "from AAC" failed...
interesting... i hope the aucdtect "team" will fix that problem...

Using LAME to detect mp3-sourced wavs?

Reply #19
I must say that the most amazing result of this thread so far is that someone is still reading it!

Quote
hmmm the new aucdtect...

I used the same version (which is, in fact, quite old), but with default settings.

Quote
as i show you above..."from mp3" detect improved... but "from AAC" failed...
interesting... i hope the aucdtect "team" will fix that problem...


Try feeding aucdtect with a strong-ath-dithered mp3. I think it used to say it was cdda almost all the time. This was a known weakness of aucdtect.
From what I read, they used a very simple formula with a few parameters to describe the energy distribution in the high frequency range. I'm not sure that it can be easily fine-tuned to detect strong dither in a consistent manner. I don't follow the development of tau analyzer. Maybe they have fixed it somehow already.