YALAC - File format development, Please help me do it right! |
![]() ![]() |
YALAC - File format development, Please help me do it right! |
Jun 30 2006, 02:29
Post
#1
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
YALAC - File format development
Purpose In this thread i will ask questions regarding specific features of the file format of Yalac, my upcoming lossless audio file compressor: Are specific features really needed and how should they be implemented in detail? I need your help, because i am often not sure, what possible future users really want. Please don't add new questions. If you are sure, that something really important is missing, send me a mail. And be aware, that i allready have more questions prepared. But i would like to post and discuss one after the other. Questions A chronological list of my questions. Each item contains the date of my first post and the state: Closed: Discussion is over, deceicions have been taken. Active: Discussion is going on. 1) What is needed for streaming? (6/06/30). Active. Question 1: What is needed for streaming? The lossless codec comparison page in the wiki contains the feature "Streaming support". Unfortunately i could not find an accurate definition of streaming, neither in the comparison page nor in the discussion thread. Let me describe the current implementation of streaming support for yalac and please tell me, if something is missing: Yalac partitions each audio file into frames, which contain up to 250 ms of audio data. Each frame can be independently decoded, it does not need data from other frames. But it does need some general information from the file header: Sampling rate, bits per sample, channels and so on. I know, that FLAC repeats this info (at least for standard audio formats) in the header of each individual frame. But i really don't know, why this should be neccessary. What do you think? Each yalac frame starts with an 16 bit sync code. A player (soft- or hardware) can start at an arbitrary position within the file stream and search for the next sync code to find the probable start of the next frame. Because the audio data itself can contain values equal to the sync code, the decoder can not be sure, that the specific value really marks a frame start. Therefore it has to try to decode the possible frame at the position of the sync code. If this fails, it has to look for the next sync code and try again. I have written a little tool to find the optimal sync code: the value with the lowest probability to show up randomly in the compressed audio data. The currently selected sync code will on average be found once every 80.000 bytes of compressed data. That means, that a player on average will detect one wrong sync code per seek operation, hence has to decode 2 instead of one frames before it can start playing. Not too bad, if you take yalac's high decoding speed into account. Right? Important: Players, which are able to use the seek table contained within the file header do not have to deal with sync codes! It's possible to dramatically reduce the probability of wrong sync code detections: If each frame contains the (compressed) frame length after the sync code, the decoder can jump accordingly to the position, where the next frame should start. If it finds there a new sync code, it's highly probable, that the position of the first sync code is valid. But the storage of the frame length needs some space and therefore the compression will be a bit worse. BTW: The first approach without inclusion of the frame length works similar: First try to decode the frame, if this is successful, the next two bytes of the stream should contain another sync code. Sorry, i know the exlpaination isn't too good, my english has to be improved... Thomas This post has been edited by TBeck: Jun 30 2006, 02:22 |
|
|
|
Jun 30 2006, 02:45
Post
#2
|
|
![]() Group: Members Posts: 1182 Joined: 19-May 05 From: Montreal, Canada Member No.: 22144 |
1)I think it's important to store the complete stream info (sampling rate, channels) in each frame : if a client connects to a stream while it is being distributed, it needs to know how to decode it. Unless you can detect the frame type from the frame itself without explicitly specifying the data, you'll need to include it.
2) Considering YALAC generally decodes at AT LEAST 50x, and frames are 250 ms (max), decoding two frames will take approximately 5 ms. I think the average user can take this kind of delay in decoding. However, should you make your codec gapless, you have to make sure that decoding the first frame in a file doesn't have this delay... Of course, you might have different decoder implementations that are more or less efficient, but that's another matter. Thanks for all your hard work, Good luck, Tristan. |
|
|
|
Jun 30 2006, 09:38
Post
#3
|
|
|
Group: Members Posts: 163 Joined: 16-January 02 Member No.: 1046 |
Question 1: What is needed for streaming? In my opinion, a file format that supports streaming must allow the decoder to start decoding at an arbitrary position within the stream. That means that all info about the stream that the decoder needs to know must be repeated in the stream. A generic solution would be to interleave stream info and audio data at an arbitrary, user-definable ratio, for example: (SI=stream info frame, A=audio frame) CODE ratio SI:A = 1:1 SI A SI A SI A SI A SI ... -lowest streaming delay -biggest storage overhead -purpose: broadcast (streaming) CODE ratio SI:A = 1:5 SI A A A A A SI A A A A A SI ... -higher streaming delay -smaller storage overhead -purpose: local storage for playback (quick seeking) CODE ratio SI:A = 1:n SI A A A A A A A A A A ... -no streaming support -no storage overhead -purpose: archiving (slow seeking) The stream info frames could contain info like this: (just an incomplete example!) - sync code - SI frame CRC - audio stream info (sample bit width, sampling frequency, etc.) - current position within stream (timestamp and/or sample number) - meta data (artist, title, etc.) The sync code together with the SI frame CRC lowers the chance for a false positive match to practically zero. A seek table and other non-streaming info (cue-sheets, album cover JPGs, etc.) could be included in an additional info frame at the start of the file, only. The seek table would allow players to skip as many SI frames as possible (defined by the precision/number of entries in the seek table) to reach the target position within the stream. This quick seeking feature would rely on the presence of SI frames because only those have sync codes that allow the decoder to re-sync with the audio stream. On the other hand, files without any SI frames (purpose: archiving) would require the player to do slow seeking, i.e. to decode the entire audio stream up to the target position. This post has been edited by smack: Jun 30 2006, 10:43 |
|
|
|
Jun 30 2006, 10:58
Post
#4
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
In my opinion, a file format that supports streaming must allow the decoder to start decoding at an arbitrary position within the stream. That means that all info about the stream that the decoder needs to know must be repeated in the stream. Well, you are right. I have been a bit too greedy and didn't want to give extra space for the audio format description... Therefore i only thought about the sync code. A generic solution would be to interleave stream info and audio data at an arbitrary, user-definable ratio, for example: ... The stream info frames could contain info like this: (just an incomplete example!) - sync code - SI frame CRC - audio stream info (sample bit width, sampling frequency, etc.) - current position within stream (timestamp and/or sample number) - meta data (artist, title, etc.) The sync code together with the SI frame CRC lowers the chance for a false positive match to practically zero. I like your idea to vary the tradeoff between space requirements for si frames and the seek efficiency! Probably i will not use seperate si frames, but instead set a flag within the regular data frames to indicate, that this one contains extended information. I anyway want to provide every frame with a sync code to make it easier to find them if a damaged file has to be reconstructed. If this flag immediately follows the sync code, the decoder can easily seek for a frame with extended info if it adds this bit to the sync pattern. A seek table and other non-streaming info (cue-sheets, album cover JPGs, etc.) could be included in an additional info frame at the start of the file, only. The seek table would allow players to skip as many SI frames as possible (defined by the precision/number of entries in the seek table) to reach the target position within the stream. This quick seeking feature would rely on the presence of SI frames because only those have sync codes that allow the decoder to re-sync with the audio stream. On the other hand, files without any SI frames (purpose: archiving) would require the player to do slow seeking, i.e. to decode the entire audio stream up to the target position. Yes. The definition of a meta data format is allready on my list. Many thanks Thomas |
|
|
|
Jun 30 2006, 10:58
Post
#5
|
|
|
Group: Members Posts: 23 Joined: 7-February 05 Member No.: 19656 |
CODE ratio SI:A = 1:n SI A A A A A A A A A A ... -no streaming support -no storage overhead -purpose: archiving (slow seeking) Seeking will be "slow", only if user will drop SI frames and seek table (nice idea, as for me...). And - if you decide to implement SI frames, it would be nice to drop sync codes in usual frames... And, may be, it will be possible to add SI frames "on the fly" - when streaming. |
|
|
|
Jun 30 2006, 11:09
Post
#6
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
QUOTE ' date='Jun 30 2006, 03:45' post='407523'] 1)I think it's important to store the complete stream info (sampling rate, channels) in each frame : if a client connects to a stream while it is being distributed, it needs to know how to decode it. Unless you can detect the frame type from the frame itself without explicitly specifying the data, you'll need to include it. You are totally right! QUOTE ' date='Jun 30 2006, 03:45' post='407523'] Thanks for all your hard work, Better wait until a working release! Currently you are getting nothing useful from my work... Thanks Thomas decide to implement SI frames, it would be nice to drop sync codes in usual frames... And, may be, it will be possible to add SI frames "on the fly" - when streaming. Hm, that would be a good reason for seperate si frames (although i do not really like them). |
|
|
|
Jun 30 2006, 11:58
Post
#7
|
|
|
Group: Members Posts: 23 Joined: 7-February 05 Member No.: 19656 |
Is it impossible to extend/reformat "usual" frame (add stream info) on the fly? IMHO, separate SI frames have advantage only if you drop sync codes in "usual" frames... Hm, or if there will be common format for "non-usual" frames (currently stream info, but in future you may need to add more types?)
This post has been edited by Zergen: Jun 30 2006, 12:00 |
|
|
|
Jun 30 2006, 12:08
Post
#8
|
|
|
Group: Members Posts: 163 Joined: 16-January 02 Member No.: 1046 |
Have you considered one of the existing container formats for your codec?
Using one of them would mean that you wouldn't have to re-invent the wheel. |
|
|
|
Jun 30 2006, 12:50
Post
#9
|
|
|
Group: Members Posts: 23 Joined: 7-February 05 Member No.: 19656 |
Have you considered one of the existing container formats for your codec? Using one of them would mean that you wouldn't have to re-invent the wheel. I did quick compare for FLAC native vs FLAC in Masroska - size difference is about 0,04%. May be it's acceptable... but I know nothing about internals of FLAC or Matroska. This post has been edited by Zergen: Jun 30 2006, 12:52 |
|
|
|
Jul 1 2006, 11:39
Post
#10
|
|
![]() Group: FB2K Moderator (Donating) Posts: 3809 Joined: 24-February 03 Member No.: 5153 |
Repeating stream information has little to do with seeking. If you can access arbitrary positions in a file, you can just grab the stream information or seek table from wherever it is usually located; the file start would be a logical choice for stream information in that case to allow decoding without seeking in the file, for example through a pipe.
The repeated stream information comes into play when you have an (unseekable) stream that does not allow random access. The frequency at which the format information is embedded in the stream will in that case determine the average latency between the client connecting to the stream and being able to decode audio data. Sync codes are also more useful for synchronizing the decoder to an unseekable stream or when recovering portions of a corrupted file than they are for seeking to a given audio position in a file with random access. Seeking to a given audio position by estimating the required file offset (for example based on the duration and the filesize) is a rather crude way to implement seeking, and accurate seeking is impossible using this approach, if the file does not contain timestamps. It would be quite inefficient if you had to decode the file from the beginning or some already known position to implement sample accurate seeking. (It would also make certain people quite bitchy.) I think it would be a good idea to consider using an established container format instead of doing that all yourself. -------------------- http://foosion.foobar2000.org/ - my components for foobar2000
|
|
|
|
Jul 2 2006, 10:46
Post
#11
|
|
|
Group: Members Posts: 23 Joined: 7-February 05 Member No.: 19656 |
In the other thread we have discussion about per-frame CRC... May be, it can be moved into other type of "special" frames - and therefore can be included/excluded by user?
By the way, I don't see how can I use these checksums - if MD5 broken, I prefer to delete this file instead of keeping "partially lossless". And such things (stream info, frame sync codes, CRC32) could easily eat most of the compresion superiority of YALAC. So, IMHO, it would be best to allow user to skip all unneeded things. Personally I use only Mokey's Audio in my archive mostly because it brings me good compression and minimum of ther stuff. But it isn't developed anymore (and Foobar2000 developer don't like it), so I hope that YALAC will be good choice - if it will have some compression ratio with comparable compression time, like HIGH preset (I don't bother with decompression - if speed is good to burn disk on the fly, it's enought). |
|
|
|
Jul 3 2006, 06:11
Post
#12
|
|
![]() Group: Members Posts: 364 Joined: 16-November 03 Member No.: 9867 |
Best choice would be to integrate YALAC into FLAC.
|
|
|
|
Jul 3 2006, 06:50
Post
#13
|
|
![]() Group: Members Posts: 31 Joined: 1-June 06 Member No.: 31342 |
Best choice would be to integrate YALAC into FLAC. I think that integrate YALAC into FLAC is a bad idea because: a) FLAC files created with YALAC will be incompatible with existing FLAC decoders. Most people will avoid use the new method because they don't want to have compatibility problems. An analogy is the ZIP format: BZIP2 is a part of the zip standard but nobody use it because only a few programs can decompress it. b) All players and hardware that want to support the FLAC format must be able to decode the FLAC algorithm and the YALAC algorithm. That doesn't help FLAC to gain support. |
|
|
|
Jul 3 2006, 07:00
Post
#14
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
I think that integrate YALAC into FLAC is a bad idea because: a) FLAC files created with YALAC will be incompatible with existing FLAC decoders. Most people will avoid use the new method because they don't want to have compatibility problems. An analogy is the ZIP format: BZIP2 is a part of the zip standard but nobody use it because only a few programs can decompress it. b) All players and hardware that want to support the FLAC format must be able to decode the FLAC algorithm and the YALAC algorithm. That doesn't help FLAC to gain support. Besides the possibility of an integration of Yalac into FLAC: I am quite sure, that the FLAC format will change sooner or later. While the file format is very well thought and excellently documented, there is some important limitation in the possible parameter set of the rice coder, that definitely hurts compression on higher sample resolutions (24 bit and up). If high resolution files will become more popular, there will be a need for a rework. Just my 2 cents, i am not an expert for FLAC. |
|
|
|
Jul 3 2006, 11:20
Post
#15
|
|
![]() Group: Members (Donating) Posts: 542 Joined: 19-March 04 From: Alberta, Canada Member No.: 12841 |
To me it doesn't seem worth compromising the compression ratio considerably for what is most likely a fringe use for a lossless codec (at the present, at least).
As long as it can be streamed by the trial and error method, if it has to make two tries to decode the first frame, what's the worst that could happen? A slight delay, or a burst of noise before the audio starts, perhaps? As for specifying the stream info, could YALAC have a standard where if it is the most common type (i.e. Stereo 16 bit 44,100 KHz), it's optional, but make it mandatory to modify this... i.e. mono 16 bit 44,100 khz has to specify that it is mono, because this is the only thing that differs from default... you only need to add the symbol for mono into each frame header, but not all the other details. That's an interesting idea that smack has about being able to customize the streaming info ratio... if you intend to stream it, you can optimize it for this purpose. Of course, odds are that a user will eventually want to stream files, but this wasn't a concern when the files were first created. It'd be interesting for the user to have the option to custom tailor his/her YALAC encodes for specific applications, i.e. embedding StreamInfo periodically within the file at definable intervals, adding redundancy for error correction, etc. Perhaps these kinds of things could potentially be changed, added or removed on the fly without need for the computation of re-encoding the file as well. I wonder how difficult it would make third party implementation of this codec if there are lots of extra features or modes of use designed into YALAC. But if these kinds of things are going to be done, best get them right the first time, and/or reserve the ability to add such features or options to design in future reverse compatibility. I really wonder about the idea of putting meta data/tag info in each frame, or at least periodically within the file. Interesting idea though. It'd make tag updating very slow and cumbersome though, it'd probably have to perhaps be a small tag that only has a few fields in of limited length (a fixed length info tag, like ID3v1). I think this would only be important for streaming over the net, so it might have to be an option specified if you're using it for such purposes. All in all, it is be good to closely analyze existing container formats, but if you're up for it, Thomas... I think it might be possible for you to create something that takes the best advantages of each format and create something better than all of them. (Not that I know much about this myself). |
|
|
|
Jul 3 2006, 12:01
Post
#16
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
As for specifying the stream info, could YALAC have a standard where if it is the most common type (i.e. Stereo 16 bit 44,100 KHz), it's optional, but make it mandatory to modify this... i.e. mono 16 bit 44,100 khz has to specify that it is mono, because this is the only thing that differs from default... you only need to add the symbol for mono into each frame header, but not all the other details. Yes, it's a very common way to have a very compact representation for the frequently used types. That's an interesting idea that smack has about being able to customize the streaming info ratio... if you intend to stream it, you can optimize it for this purpose. Of course, odds are that a user will eventually want to stream files, but this wasn't a concern when the files were first created. Currently i prefer this approach: 1) By default streaming info is always beeing inserted, but only every 2 seconds. This does not hurt compression very much, but files are always streamable. 2) If you want lower start latencies (when connecting to a running stream), you can manually increase the rate. |
|
|
|
Jul 3 2006, 15:14
Post
#17
|
|
|
Group: Members Posts: 121 Joined: 9-March 06 From: NRW, Germany Member No.: 28371 |
Despite from running an own, small icecast server, I don't know too much about streaming. But wouldn't it be possible that the client gets all the info about the stream he needs on connecting, and then never again? These information must be delivered by the server itself and not be inside the stream. In general, a streaming server has all the metainformation about a stream like tags (even though limited), samplerate, bitrate or quality and some more. What is further needed to decode a frame?
|
|
|
|
Jul 3 2006, 17:22
Post
#18
|
|
|
Group: Members Posts: 236 Joined: 10-February 04 From: London Member No.: 11923 |
Seems to me that the stream robustness info (sync codes, checksums) should be part of the stream. Metadata (sampling rate, bits per sample, channels) should be provided by the container. Container blocks do not have to match frame boundaries.
It is the container concern that you want to seek anywhere in the stream or not. If you need the ability to start anywhere without reading the header, then metadata should be repeated. If you always read the header before decoding then it is not necessary. I think that Yalac should be designed in such fashion that the container can be replaced with a different one if desired. And if you have a copy of the file header, you can decode the stream starting at any point. Reading a couple of frames before you can start playing is certainly fine. In my opinion support for seeking should be kept in the container, but there is probably a case for including it in the stream instead. |
|
|
|
Jul 3 2006, 20:54
Post
#19
|
|
|
Group: Members Posts: 121 Joined: 9-March 06 From: NRW, Germany Member No.: 28371 |
I don't know about sync codes, but I always streamed my audio without CRC (or so I believe) and can't remember of anyone complaining about errors.
|
|
|
|
Jul 5 2006, 14:18
Post
#20
|
|
|
Group: Members Posts: 163 Joined: 16-January 02 Member No.: 1046 |
Seems to me that the stream robustness info (sync codes, checksums) should be part of the stream. Metadata (sampling rate, bits per sample, channels) should be provided by the container. Container blocks do not have to match frame boundaries. I think it's the other way around: framing and synchronization is provided by the container format while all audio related info is stored in the embedded audio stream. The container format is handled by a software component "parser/splitter" which extracts the payload stream (here: audio) and sends it to the decoder. The decoder can only handle such a raw audio stream, it doesn't need to know anything about sync codes, checksums or error correction codes. Of course, this raw audio stream must be made up of independent frames to allow the decoder to start decoding at any frame in the stream. This is the case for most audio codecs, including YALAC. I think that Yalac should be designed in such fashion that the container can be replaced with a different one if desired. And if you have a copy of the file header, you can decode the stream starting at any point. In my opinion support for seeking should be kept in the container, but there is probably a case for including it in the stream instead. Seeking is a feature of the container format. The raw audio stream should not be overloaded with this unrelated (non-audio) stuff. For a good example of this concept (separation of container format and audio content) just have a look at Ogg and Vorbis. |
|
|
|
Jul 5 2006, 18:09
Post
#21
|
|
|
Group: Members Posts: 121 Joined: 9-March 06 From: NRW, Germany Member No.: 28371 |
Perhaps, you can ask the Matroska Crew if they want to help you. I'll think they'll be glad to see such a promising codec natively in their container.
A nice option would be to select manually wether you want to include these error checking features when encoding. |
|
|
|
Jul 5 2006, 22:48
Post
#22
|
|
![]() Group: Members Posts: 294 Joined: 22-September 04 From: Moscow Member No.: 17192 |
Matroska really is a nice container, robust and featureful. And, what's important, it is already developed and supported by various software applications, which won't be the case with all-new format.
-------------------- Main audio gear: H320 (Rockbox daily) + Sharp HP-MD33-S.
|
|
|
|
Jul 6 2006, 21:54
Post
#23
|
|
![]() Group: Members (Donating) Posts: 542 Joined: 19-March 04 From: Alberta, Canada Member No.: 12841 |
Using something open source, flexible, and already designed seems to be a logical choice. It might be more extensible to have the container and codec quite distinct from each other too... (Not that I really know a lot about this stuff).
|
|
|
|
Jul 6 2006, 22:31
Post
#24
|
|
![]() Group: Members (Donating) Posts: 713 Joined: 8-July 04 From: Sao Paulo Member No.: 15173 |
...And, what's important, it is already developed and supported by various software applications, which won't be the case with all-new format. You lost me there. The developer may choose any container (even a proprietary one exclusive to YALAC) and users will still depend on someone providing decoding plugins/support in players. Container support without audio format support is useless. I see no relation between container choice and support of the audio format.... This post has been edited by beto: Jul 6 2006, 22:32 -------------------- http://volutabro.blogspot.com
|
|
|
|
Jul 7 2006, 00:16
Post
#25
|
|
|
Group: Members Posts: 236 Joined: 10-February 04 From: London Member No.: 11923 |
Seems to me that the stream robustness info (sync codes, checksums) should be part of the stream. Metadata (sampling rate, bits per sample, channels) should be provided by the container. Container blocks do not have to match frame boundaries. I think it's the other way around: framing and synchronization is provided by the container format while all audio related info is stored in the embedded audio stream. The container format is handled by a software component "parser/splitter" which extracts the payload stream (here: audio) and sends it to the decoder. The decoder can only handle such a raw audio stream, it doesn't need to know anything about sync codes, checksums or error correction codes. I confess that I don't know much about audio formats. The idea behind the separation I proposed is that the decoder needs to handle errors, either by fading away or smoothing the signal. The sync codes are useful to recover from errors. For me it makes more sense to deal with this at decoder level rather than at container level. Does a splitter know about outputting audio? The only thing it can do is sending back an error message, that the decoder can do too. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 22nd November 2009 - 11:05 |