Help - Search - Members - Calendar
Full Version: foobar2000 0.9 - CUE sheets, UTF-8 and BOM
Hydrogenaudio Forums > Hosted Forums > foobar2000 > General - (fb2k)
Fandango
Once and for all. How to do it the right way?

UTF-8 or UTF-16? BOM or no BOM?

I can't get fb2k to load an embedded Unicode CUE sheet (UTF-8 BOM). It gives me the "Error parsing cuesheet: unknown cuesheet item (line 1)" error.

Loading it as an external one works somehow, except that I also get the "Error parsing cuesheet: unknown cuesheet item (line 1)" error for each track.

UTF-16 doesn't work at all. rolleyes.gif
thuan
I use Notepad++ to edit UTF-8 cuesheet and it works ok with foobar (there's no BOM here). If my knowledge is right, UTF-8 doesn't need BOM (why need Byte-order marker for single byte) only UTF-16 and higher need it. Also UTF-16 doesn't seem to work with foobar here.
Fandango
My trouble with Unicode cue sheets is partly rooted in my own confusion about all the Unicode formats and the (in my opinion) not very straight-forward handling of Unicode in my preferred editor UltraEdit-32.

For instance UltraEdit-32 always handles Unicode files, no matter whether they're UTF-8 or UTF-16, as UTF-16 internally. So when I switch to Hex view I will always see UTF-8 characters in double byte. Also whether BOM are present in the current files is kind of hidden, and changing the preferred setting whether newly created Unicode files will have a BOM only takes effect after restarting UltraEdit-32... additionally there are many "Conversion" methods accessible from the menu and UTF-16 is never explicitely mentioned, instead "Unicode" is used for it, whereas "UTF-8" stands for UTF-8... wink.gif

I've asked a similar question before in the fb2k forum, but that post was full of errors and misconceptions I had at that time (and maybe still have) and it's not worth linking here...

But now I found a way to make sure that the CUE sheets are exactly saved in the way I want them to, when using "Save as..." instead of "Save" I can force a specific Unicode format including BOM status and even UTF-16 is mentioned in the File requester dialog...

So it is UTF-8 without BOM...? that can't be right.

When I save UTF-8 without BOM I get the error "could not enumerate tracks (Object not found) on:" (the first Unicode encoded character is in the file name of the image, in this case a "ú").

Saving an UTF-8 with a BOM works. foobar2000 accepts this file without a complaint.

And I just checked back with another Hex editor, since UE's hex view is unusable in combination with Unicode files (uses UTF-16 internally). When using the "Save as..." file requester, "UTF-8 NO BOM" and "UTF-8" (with BOM) are exactly the same, except for the BOM. So I guess therefore it's UTF-8 with BOM... I'll now check with Notepad++ and see how it actually saves those files, maybe it's not correct how UE-32 does it?
thuan
Yeah, you're right. Sorry about the previous post it came out of my memory and I haven't rechecked my Notepad++ settings (it has another option UTF-8 without BOM that I've never touched because I don't use it). So foobar would only work with UTF-8 with BOM cue file (BOM here only use as signature stating that this is UTF-8 and has nothing to do with byte-order because UTF-8 always has the same byte-order), I'm sure of that now. If you wanna know more about UTF and BOM then go here.

Hope that help.
Fandango
Ah, yeah so the way Unicode characters are encoded in UTF-8 are always the same, right? No matter if a BOM exists or not? I wasn't sure about that, because I tend to forget such things when I'm not confronted with them regularly...

There's a way to detect UTF-8 files without a BOM as I can recall, but obviously this discovery isn't done by foobar2000. OT: Unfortunately it seems EAC complains about BOMs (Unknown keyword in line 1) but does recognise UTF-8 encoding correctly afterwards... sad.gif of course there's no complaint about UTF-8s without a BOM, but it then does not detect the UTF-8 encoded characters and writes them as garbage chars depending on the codepage.

EDIT: Damnt it! I just discovered that when I load a UTF-8 cue sheet, the Unicode characters are converted to ASCII characters! mad.gif And I don't mean 8-bit codepage garbage but "ú" becomes "u" and "–" (en dash) becomes "-" (minus). Sorry, I forgot to "reload info from file", that was how the ASCII cue sheet looked like, I always forget that I have to manually update so fb2k recognises the changes...
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.