QUOTE(madah @ Jan 12 2004, 07:28 PM)
Same problem here. Running foobar2k 0.7.7a and foo_id3v2 1.10 on WinXP SP1 English version.
Remember to enable ID3v2 in
Preferences -> Playback -> Standard Inputs. Change
tags to write to ID3v2.
Looking at the file with a hex-editor reveals the problem:
00000000 49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00 ID3......vTIT2..
00000010 00 13 00 00 01 e9 ff 20 00 e8 ff 20 00 eb ff 20 .....éÿ .èÿ .ëÿ
00000020 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 00 00 .ïÿ .êÿ.........
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................é is U+00E9 but becomes U+FFE9 here, due to the fact that it is threated as signed char instead of unsigned!
0xE9 is -23 as a signed value, and when converting to a larger signed type (like short or int) the sign bit is extended. So it becomes 0xFFE9, which is correct for signed values but incorrect for unsigned.
All chars above 0x7F has this problem.
There are more bugs, when enabling
Write byte order marker (BOM) in all strings:
00000000 49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00 ID3......vTIT2..
00000010 00 15 00 00 01 fe ff e9 ff 20 00 e8 ff 20 00 eb .....þÿéÿ .èÿ .ë
00000020 ff 20 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 ÿ .ïÿ .êÿ.......
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................The BOM (fe ff) suggests that the text is encoded as big-endian, but the text is still encoded in little-endian!
CODE
FE FF UTF-16, big-endian
FF FE UTF-16, little-endian
http://www.unicode.org/faq/utf_bom.html#25Furthermore, ID3v2 states that if no BOM is used, the default byte order should be big-endian. So that's another bug...
This problem should have been addressed already in the latest id3lib. After receiving the tip from the above analysis, the culprit was identified in id3lib 3.8.3 , io_helpers.cpp:
(String is degined elsewhere as)
typedef std::basic_string<char> String;
size_t io::writeUnicodeText(ID3_Writer& writer, String data, bool bom)
{
...
unicode_t ch = (data[i] << 8) | data[i+1];
...
}
The problem in the above statement in that data[i+1] is SIGNED! So when it's being explicitly casted into unicide_t (unsigned short), it get's sign extended as explained in the above quote. It could be simply fixed by casting data[i+1] to unsigned char.