A different approach to transcoding?

Topic: A different approach to transcoding? (Read 4597 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

A different approach to transcoding?

2014-04-05 14:39:59

OK, I'm no scientific nor engineer. I do have some level of acquaintance with technical stuff, but I do not claim any credentials or competency on the matter.
The following is just some amalgamation of ideas I had as a result of casually reading high-level technical articles on how digital audio coding works.
Most probably the idea is not new, and probably dismissed(?) but my searching did not yield much in the way of a definitive answer, without prejudice of there actually being one and me overlooking it, of course. So, please, don't be angered by this noob's inquire. If the matter annoys you, please, just ignore me, but if you know about this and care to spend some time to lay it out for me, please, do so.

The matter I'd like to discuss or have an explanation about, is of lossless transcoding or conversion between lossy formats, the main purpose of it being to get rid of files encoded with old/deprecated/propietary formats (Real Audio anyone?) and into more modern, preferably open, formats, particularly in case of materials for which there's no other reasonable source for.
The basic concept behind this would-be new(?) technique would be partial decoding, up to a point/layer in which you may find some sort of compatible data or signal representation and reencoding it, losslessly or close to it, in the target format.
Now, to my understanding, one of the basic concepts of lossy audio coding lies in the conversion of the PCM signal to the frequency domain, the resulting data of which then goes through a psychoacoustic evaluation to determine what parts of it are discarded or encoded with a lesser fidelity. Would it be feasible or useful to decode an existing file in one of these legacy formats up to the point where you get that frequency domain data and reencode it in the target format skipping the psychoacoustic evaluation stage?

So, in short, the idea would be finding compatible or comparable steps in the encode and decode stages of the different codecs, decoding the source files up to that point, and, roughly, recoding that data as losslessly a as possible in the target format.
Is this possible? feasible? reasonable?

Please, share your thoughts and thanks for your time.

A different approach to transcoding?

Reply #1 – 2014-04-05 17:15:39

Quote from: radorn on 2014-04-05 14:39:59

The basic concept behind this would-be new(?) technique would be partial decoding, up to a point/layer in which you may find some sort of compatible data or signal representation and reencoding it, losslessly or close to it, in the target format.
[...]
So, in short, the idea would be finding compatible or comparable steps in the encode and decode stages of the different codecs, decoding the source files up to that point, and, roughly, recoding that data as losslessly a as possible in the target format.
Is this possible? feasible? reasonable?

It is by all means possible in theory, but in the same theory where password cracking is trivial (by brute force) if you have an unlimited number of attempt and unlimited time to wait, and that, I think we all agree, is way out of the "reasonable" for encoding. Sketching a development path under the unreasonable "time no object" assumption is easy.

There is a repacking tool for mp3 to mp3, and I think that is all (not counting MPEG-4 scalable-to-lossless) - AFAIK there is not even anything such from MPEG-something to MPEG-something-else.

A different approach to transcoding?

Reply #2 – 2014-04-05 19:19:11

To follow-up myself: I have asked - at HA, the discussions should be possible to find - a few times why lossless codecs are not better at compressing decoded lossies. After all, 7 megabytes of mp3 need not take more space than 7 megabytes, and therefore the decoded signal does, patently, fit into 7 megabytes if properly packed.

The best explanation I have been able to get out of this, is that finding back the 7 megabytes representation of the decoded signal, is pretty much like password cracking or brute force decryption. You can try to protect (=encrypt) a spreadsheet. Then you .zip the original and the protected. You won't get much compression out of the latter, because encryption - on purpose! - creates a file that appears like "white noise". (If there are systematic patterns, it would ease cracking.) So encrypted files are another example of something that is extremely hard to compress, although the original - which has basically the same content - could be easy to compress. If you have e.g. 7-zip, you can try to encrypt without compressing, and then putting this into another compressed file blah blah blah. (I have barely tested it.)

A different approach to transcoding?

Reply #3 – 2014-04-05 23:34:55

One of the problems would be that representation of the frequency domain happens with respect to a certain number of wave samples which makes up for the frame size, and this number varies from codec to codec. So a necessary condition is that the target codec allows for an arbitrary frame size. Thus AAC can't be the target codec AFAIK. Maybe Ogg Vorbis can be.
But I guess there's much more restrictions.

If it's only about getting away with old codecs I guess you get closest to what you want by decoding the old codec to WAV, and reencode with lossyFLAC (or wvPack lossy as an alternative). As long as you can allow for avrage bitrates in the 300...400 kbps range you'll be fine.

A different approach to transcoding?

Reply #4 – 2014-04-06 14:01:01

I wonder if anyone ever tried to modify a lossy encoder so that it would skip the psychoacoustic evaluation (or the part of it that discards stuff), therefore taking the incoming audio as already evaluated. I wonder what the results would be.

Could it be as simple as that? probably not, but, has this ever been tested?
Can such steps in the encoding process be bypassed like that? I suppose it's not that simple, or is it?

As far as I'm concerned, for such a contraption, keeping the original bitrate is not even important (though the point is also to avoid the huge increase that lossy-to-lossless normally causes). I just thought that an scheme designed to encode psychoacoustically evaluated audio would work better to, say, "repack" foreign psychoacoustically evaluated audio than a lossless scheme would, which are naturally designed to store fullband "no-holes" audio data, hence the whole lossy-to-lossy idea.

Another idea, though I think this one sounds more dreamy and irrealistic than the former, would be lossy-to-lossless transcoding with customized decoders and encoders that, in addition to exchanging a PCM stream, would also derive some "hints" from the decode process to tell the encoder where the "holes" are, so to speak, so that it can just ignore these parts or something. But it sounds to me that lossy codecs would still be better for that, since they are designed for frequency domain encoding, where lossless codecs aren't(?), since that wouldn't yield bit-exact storage(?). Correct me if I'm wrong.

----
@Porcus: I knew, and actually used, mp3packer in the past. Cool thing. Used it to reduce a bunch of old CBR files. Though it really doesn't have much relation to this, if you ask me. Wasn't aware of MPEG-4 SLS, though it doesn't seem all that related either.
You could also had mentioned the never implemented(?) Vorbis peeling feature, which again would not have all that much relation xD.

Also, yes, compressing encrypted data doesn't seem such a good idea. encrypting compressed data, on the other hand, works.
I never actually encrypted an archive myself, though. Never had such a need. Only monetary, legal or privacy motivations -basically- seem to reasonably require that... surely not the case of people that put up "free stuff" on the net and put a password in their ZIPs... so when it finishes downloading and you already forgot where you got it from, you can't open it. THANKS! xD

@halb27: Thanks for the frame size bit. Also for the LossyLOSSLESS. I saw it before but didn't pay much attention to it. I'm not sure I fully get the point of that. How does LossySUMTHIN generally compare to the ususal lossy formats? I guess the degradation type is different: variably higher noise floor vs frequency removal.
In my mind, remembering really ugly lossy encodes where the psychoacoustic masking was abused beyond it's real ability to mask what was removed caused the hearing experience to be rather exhausting, I suppose I would prefer some noise instead, but I guess modern encoders work better than that. I'm not talking transparency here, but rather audible comfort.
----

Anyway, all of this is just for curiosity, for the sake of discussion. No need for anything to actually come out of it, though it would be really nice.
Neither am I incredibly bothered by actually having stuff locked in shitty formats. I do have some, but I think I can live with it for now. Of course, if there was such a thing, I would do the conversion right away and "sanitize" all that stuff, but it's not an urgency.
Also in lack of such a lossless lossy-to-lossy transcode solution, for music I would just FLAC'em in most cases, in order to preserve them, and for others I would consider them in a case by case scenario, lossy, lossless, lossyLOSSLESS or whatever

A different approach to transcoding?

Reply #5 – 2014-04-06 15:25:41

The lossy step is quantization not the pysmodel. The later just tells the quantizer how to allocate bits. You can't skip quantization, although a really bad encoder could allocate bits without considering the signal.

In general you can't do this because frequency transforming and quantization is handled differently between codecs, so at a minimum you have to requantize and inverse Fourier transform and then do the new codecs forward transform and that spreads out your quantization error between the new coefficients.

A different approach to transcoding?

Reply #6 – 2014-04-06 17:35:02

Quote from: saratoga on 2014-04-06 15:25:41

The lossy step is quantization not the pysmodel. The later just tells the quantizer how to allocate bits.

I see. I guess that, in lack of this information, I imagined the psymodel and quantizer were a single step and named the whole by the first part. Good to know, thanks

A different approach to transcoding?

Reply #7 – 2014-04-06 19:52:38

There is at least one published paper on this subject:

Koichi Takagi, Satoshi Miyaji, Shigeyuki Sakazawa, Yasuhiro Takishima. Conversion of MP3 to AAC in the Compressed Domain (in Proceedings of the IEEE 8th Workshop on Multimedia Signal Processing, 2006)

Basically, they used MP3 scalefactor values to control the AAC bit allocation. Of course there is still some added distortion, but the authors claim that the obtained perceptual quality was higher than that of a usual transcode. The target AAC bitrates were quite low though (96 and 64 kbps), and they used the ISO reference encoder. I'd guess that for higher bitrates, and maybe with more advanced encoder, the only potential gain would be the increased transcoding speed.

It's also worth noting that MP3 and AAC-LC are pretty close in terms of data representation concepts. It would be much less straightforward to use this approach for e.g. MP3 to Vorbis transcoding.

@Porcus: you seem to refer to a different problem: encoding the PCM stream that is known to be previously encoded by a similar algorithm, but the encoded source is not available anymore. It is not exactly what OP's question is about.

Notice