Help - Search - Members - Calendar
Full Version: bug in foo_id3v2 1.10 in foobar 0.77a
Hydrogenaudio Forums > Hosted Forums > foobar2000 > 3rd Party Plugins - (fb2k)
bigboo
foo_id3v2 1.10 doesn't read "é è ë ï ê à" etc... letters correctly

- Load a mp3 in foobar (using id3v2+id3v1 tag support)
- Tag it with TITLE = é è ë ï ê
- click "update file"
- click "reload info from file", it displays garbage like ←│(>squares in unicode) instead of éè

But if you load the file in winamp the tag is read correctly

The bug is only when reading tags, not when writing them.
There is no problem when you use id3v1 support only.

(using foobar 0.77a foo_id3v2 1.10 and windows2000 sp4 french version)
in foo_id3v2 config everything is unchecked (default options)
anza
Tried tagging a file like you suggested but it works well.


And doesn't this belong to 3rd party plugins forum?
bigboo
That's strange the problem is reproductible at 100% on my system, maybe it's a windows2000 problem only.

Also when I enable "Write ISO-8859-1 tags instead of UTF-16" in id3v2 plugin configuration, there is no more problem when tagging files.
It seems that the bug is only when reading id3v2 tags in UTF-16 on my system
Zoominee
QUOTE(bigboo @ Jan 6 2004, 03:21 PM)
That's strange the problem is reproductible at 100% on my system, maybe it's a windows2000 problem only.

Also when I enable "Write ISO-8859-1 tags instead of UTF-16" in id3v2 plugin configuration, there is no more problem when tagging files.
It seems that the bug is only when reading id3v2 tags in UTF-16 on my system

I agree - it's a "bug". The problem is that Windows NT, 2000, 98, ME and 95 don't support Unicode letters. For some reason, if you haven't checked the ISO... checkbox, foobar writes the tag as UTF. Then the characters like é ä etc don't show properly when the tag is read from the file.
I don't understand how they can be saved as ISO (so they obviously are included in the character table for the OS), but why the computer is too stupid to guess the correct symbol corresponding to the UTF representation of letters like é ä when these are stored in the UTF tag.
I agree with your solution - while you're not planning to upgrade to a newer version of Windows, you'll want to keep all your tags ISO-....
Mike Giacomelli
2000 supports Unicode:

http://support.microsoft.com/default.aspx?...&NoWebContent=1
madah
Same problem here. Running foobar2k 0.7.7a and foo_id3v2 1.10 on WinXP SP1 English version.

Remember to enable ID3v2 in Preferences -> Playback -> Standard Inputs. Change tags to write to ID3v2.

Looking at the file with a hex-editor reveals the problem:

00000000 49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00 ID3......vTIT2..
00000010 00 13 00 00 01 e9 ff 20 00 e8 ff 20 00 eb ff 20 .....éÿ .èÿ .ëÿ
00000020 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 00 00 .ïÿ .êÿ.........
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................


é is U+00E9 but becomes U+FFE9 here, due to the fact that it is threated as signed char instead of unsigned!

0xE9 is -23 as a signed value, and when converting to a larger signed type (like short or int) the sign bit is extended. So it becomes 0xFFE9, which is correct for signed values but incorrect for unsigned.
All chars above 0x7F has this problem.

There are more bugs, when enabling Write byte order marker (BOM) in all strings:

00000000 49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00 ID3......vTIT2..
00000010 00 15 00 00 01 fe ff e9 ff 20 00 e8 ff 20 00 eb .....þÿéÿ .èÿ .ë
00000020 ff 20 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 ÿ .ïÿ .êÿ.......
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................


The BOM (fe ff) suggests that the text is encoded as big-endian, but the text is still encoded in little-endian!

CODE
FE FF  UTF-16, big-endian
FF FE  UTF-16, little-endian


http://www.unicode.org/faq/utf_bom.html#25

Furthermore, ID3v2 states that if no BOM is used, the default byte order should be big-endian. So that's another bug...
kode54
Sign extension is a fault of the Unicode text writer converting from the input String type without recasting.

As for the byte order marker, id3lib is broken.

ID3lib handles the BOM backwards, or it was designed with a big-endian platform in mind only, as it reads the characters into a big-endian format in memory, then later uses functions such as ucslen() on them. It also reads the characters in their native order if there is no BOM present.

This backwards handling led me to write a backwards writer. Now that I have corrected both reading and writing with BOM, I see that Explorer, Windows Media Player, and Winamp all handle the tags properly.

Stupid miscoded library misleading me... There is a lack of documentation on how that should be handled. ID3lib is just a huge mess.

If anybody would like to write a proper ID3v2 handling library from scratch... something that can handle multiple frames of each type, and also write the same... Be my guest! :B

New v1.11 uploaded, should fix all your UTF-16 reading AND writing problems. BOM writing option has been renamed internally and defaults to ON now, let me know if newly written tags stop working any particular software...
harashin
@kode54:
Confirmed with Japanese characters in UTF-16, no problem here.
Thanks for your work.
ajuil
QUOTE(madah @ Jan 12 2004, 07:28 PM)
Same problem here. Running foobar2k 0.7.7a and foo_id3v2 1.10 on WinXP SP1 English version.

Remember to enable ID3v2 in Preferences -> Playback -> Standard Inputs. Change tags to write to ID3v2.

Looking at the file with a hex-editor reveals the problem:

00000000  49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00  ID3......vTIT2..
00000010  00 13 00 00 01 e9 ff 20 00 e8 ff 20 00 eb ff 20  .....éÿ .èÿ .ëÿ
00000020  00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 00 00  .ïÿ .êÿ.........
00000030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................


é is U+00E9 but becomes U+FFE9 here, due to the fact that it is threated as signed char instead of unsigned!

0xE9 is -23 as a signed value, and when converting to a larger signed type (like short or int) the sign bit is extended. So it becomes 0xFFE9, which is correct for signed values but incorrect for unsigned.
All chars above 0x7F has this problem.

There are more bugs, when enabling Write byte order marker (BOM) in all strings:

00000000  49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00  ID3......vTIT2..
00000010  00 15 00 00 01 fe ff e9 ff 20 00 e8 ff 20 00 eb  .....þÿéÿ .èÿ .ë
00000020  ff 20 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00  ÿ .ïÿ .êÿ.......
00000030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................


The BOM (fe ff) suggests that the text is encoded as big-endian, but the text is still encoded in little-endian!

CODE
FE FF  UTF-16, big-endian
FF FE  UTF-16, little-endian


http://www.unicode.org/faq/utf_bom.html#25

Furthermore, ID3v2 states that if no BOM is used, the default byte order should be big-endian. So that's another bug...

This problem should have been addressed already in the latest id3lib. After receiving the tip from the above analysis, the culprit was identified in id3lib 3.8.3 , io_helpers.cpp:

(String is degined elsewhere as)
typedef std::basic_string<char> String;

size_t io::writeUnicodeText(ID3_Writer& writer, String data, bool bom)
{
...
unicode_t ch = (data[i] << 8) | data[i+1];
...
}

The problem in the above statement in that data[i+1] is SIGNED! So when it's being explicitly casted into unicide_t (unsigned short), it get's sign extended as explained in the above quote. It could be simply fixed by casting data[i+1] to unsigned char.
kode54
Yes, it might have been addressed in the lastest id3lib, but I've made so many of my own changes, it will probably be a pain in the ass to upgrade. (Of course, I could just extract the two versions, diff, then patch my code, but there will probably be some rejects to fix, and I might have already implemented some of their fixes.)
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.