Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: when do I flush with lame? (Read 8018 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

when do I flush with lame?

I just built the latest lame lib on a debian box and I am trying to work thru its API. I have a couple of questions about using the lame API and in particular about the lame_encode_buffer() and lame_encode_flush() routines.

When I built the opus lib it is very specific about the frame size/period. But I cannot find anything specific about the frame size or period in lame.
Q1: can I use any frame size/period with MP3/lame?  Is there an optimal frame size/period?

I am currently calling lame_encode_buffer() for every captured frame (I can configure the frame size/period in real time but say 40 milliseconds for example).  I noticed that both lame_encode_buffer() and lame_encode_flush() return non-zero so both give me encoded data.
Q2: do I have to treat the 2 buffers separate?  That is, if lame_encode_flush() is returning more mp3 encoded data, do I have to then copy those flushed mp3 data to the end of the results of the prior call to lame_encode_buffer()?

My application is streaming live audio.
Q3: should I always call lame_encode_flush() after I call lame_encode_buffer() ?

Thanks for any help,

-Andres

when do I flush with lame?

Reply #1
A3: lame frontend (frontend/lame_main.c) calls lame_encode_flush only once at the end of encoding. The same is true for mp3rtp.c

when do I flush with lame?

Reply #2
Thank you for your reply.

when do I flush with lame?

Reply #3
A couple more questions regarding lame_encode_flush() in the lame API.

lvqcl has pointed out that in the frontend example code, that lame_encode_flush() gets called only once at the end of encoding data sourced from a file.  However, in an application that is streaming MP3 encoded data, for all practical purposes the end-of-file does not really occur (I guess at the end of the broadcast it occurs, but for the steady-state window of the stream it does not seem to apply).

This poses a problem (at least it is a problem to *me* because of my limited understanding).  To stream MP3 data in an RTP session, the encoded data is formatted according to RFC 2250, which adds a 4-byte header at the start of the encoded MP3 data.  The purpose of this header is to specify the offset of any fragmented MP3 frames. If the RTP pkt contains an integer number of complete (i.e. NOT framgmented) frames, then this 4-byte header should simply be 0, meaning there are no fragmented MP3 frames contained in the RTP pkt. But if the RTP pkt payload contains a fragmented MP3 frame, then this 4-byte field must specify the offset that pkt payload data is from the previous RTP pkt payload data (which contains the first part of the fragmented data).

I do not see anything in the lame API that would allow me to determine when a given frame is fragmented after I call lame_encode_buffer(). 
Q1: how do I determine that offset for the final fragmented MP3 frame after the call to lame_encode_buffer() ??

In the frontend example code, it appears to imply that the frames are NOT fragmented. The RTP header struct "rtpheader" included this additional 4-byte fragmentation header (as int iAudioHeader) as specified by RFC 2250 but it is set to 0 and never changed during the encoding. So that code seems to assume that there will never be any fragmented frames in the encoded data returned by lame_encode_buffer().

But the only way I have be able to insure to obtain a complete frame is to call lame_encode_flush() after each call to lame_encode_buffer(). That way I always get an MP3 frame header at the beginning of the encoded data returned from each call to lame_encode_buffer().

Q2: so is it *wrong* to call lame_encode_flush() after each lame_encode_buffer() ??  It seems that that is the only way I can get and encoded stream suitable for inclusion in an RTP pkt using a fragmentation header of 0.

Thanks,

-Andres

when do I flush with lame?

Reply #4
Don't take this as an authorized voice, but I believe you're wrong in both fronts:

* On the buffer versus flush front:
Since an mp3 frame always contains the same number of samples (i.e. input size), but the output has a different number of bytes (ABR, VBR, padding vs non-padding CBR, bit reservoir...), the call to buffer is intended for a "running" call, as in the encoder takes these input bytes, and, if it generated enough bytes, there will be an output, else it is expected that you supply the buffer method with more input bytes.
Then, for streams that have a determined size (i.e. files on disk), there is an end, there is a way to tell the encoder that there will be no more calls, so that it needs to output whatever it is left. Not only that, but there is also the Xing/LAME tags (VBR seek information, gapless playback...).
On a continuous stream, flush is not needed because there will always be another "buffer" call, except if you want to do so when you stop streaming.

* On the RTP packets:
IIRC, the streamed packets are of a fixed size. Since the MP3 packets can differ in size (ABR/VBR), or simply be of a size non-divisible by the RTP packet size (the different CBR sizes), it is up to you to tell how you are dividing the data that you send, so that the receiving end can start decoding on the proper frame, without the need of skipping partial data until finding the packet start.
Obviously, on a stream, people connect after the stream has started streaming.

when do I flush with lame?

Reply #5
Thank you for your response. I am definitely confused so I appreciate your explanations.


Since an mp3 frame always contains the same number of samples (i.e. input size), but the output has a different number of bytes (ABR, VBR, padding vs non-padding CBR, bit reservoir...), the call to buffer is intended for a "running" call, as in the encoder takes these input bytes, and, if it generated enough bytes, there will be an output, else it is expected that you supply the buffer method with more input bytes.


This makes sense to me. I have observed that subsequent calls to lame_encode_buffer() deliver the encoded data that *appears* to not be on a frame boundary. That is, I never see an MP3 frame header as the first data bytes of any of the subsequent calls to lame_encode_buffer().  My interpretation of this observation is that the encoded data returned from subsequent calls to _buffer() routine is delivering encoded data that is fragmented across frame boundaries.

Is this where I am wrong?  In other words, does _buffer() routine always return an integral decodable frame? 

I thought the bit resevoir functionality meant that the encoder could stuff some bits from other frames and that was why I was not seeing any MP3 frame headers as the start of data returned from _buffer(). If the first bytes that are returned from subsequent calls to lame_encode_buffer() are an MP3 frame header, then I am assured that the previous call delivered the end of the previous decodable frame. But if the first bytes are not an MP3 frame header, I have no way of knowing when a complete decodable frame has been delivered by the encoder.

Quote
Then, for streams that have a determined size (i.e. files on disk), there is an end, there is a way to tell the encoder that there will be no more calls, so that it needs to output whatever it is left. Not only that, but there is also the Xing/LAME tags (VBR seek information, gapless playback...). On a continuous stream, flush is not needed because there will always be another "buffer" call, except if you want to do so when you stop streaming.


this makes perfect sense to me for files on a disk.

Quote
* On the RTP packets:
IIRC, the streamed packets are of a fixed size. Since the MP3 packets can differ in size (ABR/VBR), or simply be of a size non-divisible by the RTP packet size (the different CBR sizes), it is up to you to tell how you are dividing the data that you send, so that the receiving end can start decoding on the proper frame, without the need of skipping partial data until finding the packet start.


This part or your response is still confusing to me. The streamed RTP pkts are not a fixed size, they can be any size as long as they fit within the MTU of the lower networking layers. If a single decodable MP3 frame is too large for a single RTP pkt, it is sent in several RTP pkts.  All but the final RTP pkt will have the Marker bit in the RTP header cleared, and the final RTP pkt has the Marker bit set. That way the receiver knows how to construct a single complete decodable MP3 frame regardless of its size and regardless of how many RTP pkts transported that frame.  But in order to packetize the MP3 frame in RTP, I must start with a completely decodable MP3 frame to begin with.  The point you are making is exactly why I am confused, because lame_encode_buffer() *appears* (at least to my confused understanding) to not return encoded data on an MP3 frame boundary. If I am not assured that I have a completely decodable MP3 frame to begin with, then I do not know how to packetize that data across one or more RTP pkts so the  receiver knows how to decode that single MP3 frame.

-Andres

when do I flush with lame?

Reply #6
Hi to everybody.
I am trying to achieve a similar task and came across the same problems.

Packets returned from the encoding procedure are probably not guaranteed to be aligned on frame boundaries because the encoding scheme used in LAME spans across multiple frames to optimize quality and bitrate.
I could achieve frame boundary preservation by disabling the bit reservoir technique using the api call
Code: [Select]
lame_set_disable_reservoir(gfp, 1);


This is just a nasty trick to help in the debug process.
Indeed as lvqcl was pointing out the frontend/mp3rtp application (must be compiled with --enable-mp3rtp) is feeding data as it is being encoded with a variable amount of encoded bytes in each packet (so there is no frame boundary at the start of the rtp packet) and it is supposed to work.

Hope it could be of any help.
Best regards,
Giulio

when do I flush with lame?

Reply #7
Thank you for your response Giulio.

I agree. I have used lame_set_disable_resevoir() and it does help a bit, but I am still having issues with the lame API.

Whenever I cannot get something to work correctly, I always assume that it is me, that is, that I have coded it wrong or that I must be doing something wrong or incorrect. However, in my opinion, I do not think the RTP code in the frontend subdirectory is correct.

Using wireshark, I captured some RTP/MP3 stream traffic that did indeed stream to the VLC client correctly--it played out perfectly. All of the RTP pkts were of the exact same size and had a consistent and reasonable RTP timestamp increment (I forget exactly what that timestamp increment was, but it was a bit over 1000). Every RTP pkt had the RTP payload start with a 4-byte fragmentation header (set to 0) as specified by RFC 2250, and then a valid MP3 frame header. So, every RTP pkt started with an MP3 frame header.

The RTP code in the frontend subdirectory uses a timestamp increment of 5, which IMHO, cannot be correct. And, even when I configure the lame API to run in CBR, I always get a wide range of return values from lame_encode_buffer().  The only way to get a constant return from lame_encode_buffer() is to turn the bit resesvoir off as you suggested. And even then, the number of encoded bytes returned from lame_encode_buffer() varies by 1 byte every 10 or so encoded frames. So, it is hard for me to believe that lame is producing a constant encoded stream even with the bit resesvoir turned off. And I do not know how to get frame boundaries when bit resesvoir is turned on without manually parsing the returned encoded stream from lame_encode_buffer().

But then I am sure that it is still *me* and that I still do not understand MP3 sufficiently and that I am not using the lame API correctly.  :-)

So I am continuing to struggle with this.

-Andres

when do I flush with lame?

Reply #8
Hi Andreas.
The mp3rtp application does some nasty things. It does not correctly increment the RTP timestamp indeed. It should increment with a 90 KHz tick rate and it should increment by 1152 samples every whole frame. In my case I am encoding PCM audio @ 48 KHz (24 ms) so it should increment of 2160 ticks every time (assuming 1 frame per packet).
It is important to have the 4 byte 0s header inside the payload before the actual MP3 header.

Yes 1 byte more or less returned by the encoding routine is consistent (at least when disabling bit reservoir and using CBR). It varies from time to time because the number of encoded bytes is fractional.
There is a calculation that predicts the number of bytes used by the encoder:

FrameSize = 144 * BitRate / (SampleRate + Padding).

I also specified input and output sample rates to be the same (48 KHz).
I could play the stream with VLC by using rtp://<mcast addr>:<port>

Follows the sample code I am using to initialize the encoder
Code: [Select]
        int ql_level[] = {7, 5, 2};
        // init LAME MP3 encoder
        gfp = lame_init();

        lame_set_num_channels(gfp, AUDIO_CHAN);
        lame_set_in_samplerate(gfp, AUDIO_FREQ);
        lame_set_out_samplerate(gfp, AUDIO_FREQ);
        lame_set_brate(gfp, br_level[quality]);
        lame_set_mode(gfp, JOINT_STEREO);
        lame_set_VBR(gfp, vbr_off);
        lame_set_disable_reservoir(gfp, TRUE);
        lame_set_quality(gfp, ql_level[quality]);   /* 7=low  5=medium  2=high */

        lame_init_params(gfp);

        // an MP3 frame is 1152 samples
        chunk_size = 4608;


and I use the following for compressing the PCM data:
Code: [Select]
        *len  = lame_encode_buffer_interleaved(gfp, (short *) encbuf, chunk_samples, buf, max_len);


I am using live555 to stream the audio and it work quite good.
I would suggest you to first create a mp3 file by appending the various encoded frames returned by lame and checking that is being correctly encoded by playing it back on the PC.
Then to check that the audio is being correctly streamed over RTP by using wireshark and VLC to playback it.
Live555 includes the live555MediaServer utility that may be used to compare the RTP stream to help spotting errors in the RTP protocol.

Best regards,
Giulio