Mojibake

Saga_Musix - 01:20 1 March 2014 #

There are obviously tons of prods in the database where charset conversion failed, i.e. result files were interpreted using the wrong charset. Especially with the most typical european umlauts, it's often easy to figure out, but not always. I've created a tag (with so far rather randomly chosen prods, just stuff I have come across a random search) that helps localizing such prods, so that we can assign them with proper names:
http://demozoo.org/productions/tagged/mojibake/

For cyrillic mojibake, this tool has been proven to be really useful, so if you find some names that consist of 100% strange characters, you can check if they make sense in cyrillic and fix them: http://www.online-decoder.com/ru/ei

dipswitch - 14:01 1 March 2014 #

Very good point, thanks for taking the initiative on this one! I have fixed several Hungarian and Russian titles, but it's good to collect them centrally.

dipswitch - 14:06 1 March 2014 #

If you have any questions about the legitimacy of Cyrillic prod titles, please feel free to ask me, since I'm a Russian native speaker.

Saga_Musix - 18:58 1 March 2014 #

Here's a mapping of umlauts from typical DOS code pages to Windows ANSI.
å > †
ä > „
ö > ”
ü > non-printable character :(
ß > á
Ä > Ž
Ö > ™
Ü > š
à > …
á > (some kind of whitespace?)
ò > •
ó > ¢
ù > —
ú > £
â > ƒ
ô > “
û > –
æ > ‘
Æ > ’

Hint: Searching for single consonants will bring up many prods with broken umlauts.

Saga_Musix - 19:47 1 March 2014 #

Using the table from above, I have cleared everything from the mojibake tag so far.

Add a reply