Thanks again for reply! (This is the only place, where I got ANY feedback, that's why I keep pursuing...)
QUOTE(magic75 @ Nov 19 2004, 06:22 AM)
To the best of my knowledge it cannot be done one the actual MP3 data because the process of mixing involves adding and possibly scaling sample-by-sample.
Yes, mixing .wav-s is "basic" mathematics, the algorithm simply sums the corresponding bytes. (Not that easy, but something like this.)
I do have the fear that it is
in theory impossible to do the same with the corresponding segments of the corresponding frames of different mp3s.
But I am not quite
sure (since I am neither an expert [far from being one] in this field), that it is impossible. I am unsure, because I think that the audio data contained in the body of an mp3 frame may be somehow time-ordered.
What I mean is that the audio data is either divided into subsequent smaller parts, or it constitutes a static block. In the latter case two corresponding blocks like this (from two different mp3s) maybe summed together somehow. (I really don't know how it works). In the former case the smaller time-representing parts may be summed, if they are in sync to each other in the different mp3 bits.
Sorry for insisting on this. A scientific NO is enough for me, but I must see the definitive answer before giving up this search.
QUOTE
And all lossy codecs like MP3 have a completely different representation of audio than sample based. Instead longer chunks, blocks are used. I am not an expert in this area, but the way I have understood how lossy encoding of audio works it shouldn't be possible to do this without something that at least resembles decoding and encoding again.
Thanks. I see that part of my problem is that I can't differentiate between "blocks" and "chunks"... But I feel I will not be able to learn the mp3 file structure in its entirety.

(And in the same time I dream of file-level operations... though those would be done by expert programmers.)
QUOTE
About the filesize issue. It seems to me that it should be (technically, maybe not practically) possible to workaround that by just decoding chunks (in memory not on disk), mixing, and encoding on-the-fly.
Thank you for the idea!
Though I am a bit worried, because I experienced, that different encoded parts can not be seamlessly joined. And AFAIK the psychoacoustic modelling involves longer scale investigation of the source-material, than a couple of samples. But even though this may be a solution.
Thnx again.
P.s.: Do you have any ideas where should I ask for my "Final Answer?" (42)