Page 1 of 2

What language is this?

Posted: Tue Apr 17, 2007 10:35 pm
by Solfius
I have met a non-english character, but I think the character encoding is wrong, so all I get is mixed up symbols.

Can anyone identify the language?

Cze¶æ

this word seems to be mostly right, if you can identify the language, do you know what encoding I should use. I'm browsing with firefox if that makes a difference

Posted: Tue Apr 17, 2007 10:46 pm
by sanchez
It's Polish. But it's not displaying properly for me either, in FF. The last two characters should look like 's' and 'c' with little diacritics on top. The word means 'Hi'.

Re: What language is this?

Posted: Tue Apr 17, 2007 11:11 pm
by marol
Solfius wrote:(...) do you know what encoding I should use. I'm browsing with firefox if that makes a difference
Appropriate encoding is ISO-8859-2 (Central European). The word should be visible as 'Cześć'.

Posted: Tue Apr 17, 2007 11:11 pm
by Solfius
Thanks, I guessed the meaning from the context

after some experimentation: Central European (ISO) seems to make it look right, but there are other notes that still look wrong

eg a note called "Ęŕđňŕ (1077)-"

the date is obvious, but I've never seen that many special characters in a single word before, is it possibly a diferent language again?

Posted: Tue Apr 17, 2007 11:13 pm
by marol
There's no valid Polish word made up with only special characters. It could be Russian or Chinese (less probably). You can also open main Cantr page and try to switch languages to see the effect.

Posted: Tue Apr 17, 2007 11:15 pm
by sanchez
The eth looks wrong. No Cantr language uses that, afaik.

Posted: Tue Apr 17, 2007 11:16 pm
by Solfius
Perhaps Russian, do you know what encoding I ought to use for that?

Is there a wiki page with this on? if not perhaps there should be

Posted: Tue Apr 17, 2007 11:26 pm
by marol
Sorry, I don't know.

Unfortunately when Cantr arose Unicode standard wasn't enough popular, therefore game code didn't specify any default encoding. So the deciding factor in most cases was the client browser and its default settings. Text written with particular encoding is hardly readable by people that use other encodings (or is completely unreadable in some cases).

Even worse - in some countries there are more than one encoding standard in use, so that problem occures even for single language texts.

There are plans/proposals of converting Cantr database to Unicode, which would solve all that problems. However it would need overhelming amount of work - proper encoding of every single note, name or speech would have to be estimated and then converted.

Posted: Wed Apr 18, 2007 12:06 am
by Solfius
hopefully there's a way to ensure everything from this point forwards is unicode, it can be difficult enough interacting with non-english chars without trying to get their messages to display properly

Posted: Wed Apr 18, 2007 12:53 am
by wichita
We're aware of the Unicode issue and would like to get it taken care of. Some extra help from folks who know how to program would be nice. :)

Posted: Wed Apr 18, 2007 6:23 am
by marol
Solfius wrote:hopefully there's a way to ensure everything from this point forwards is unicode, it can be difficult enough interacting with non-english chars without trying to get their messages to display properly
We can not switch to Unicode now without bothering with older stuff - in example all notes made before switching point would become unreadable. We have to convert them all first.

Posted: Wed Apr 18, 2007 8:28 am
by pipok
marol wrote:There's no valid Polish word made up with only special characters.

żółć (eng. 'bile') - 'small z, dot above' + 'small o, acute accent' + 'small l, stroke' + 'small c, acute accent'

Posted: Wed Apr 18, 2007 8:32 am
by pipok
Solfius wrote:eg a note called "Ęŕđňŕ (1077)-"

the date is obvious, but I've never seen that many special characters in a single word before, is it possibly a diferent language again?
It's Russian, in Windows-1251, the word is 'карта' (lat.transcr.: 'karta') and means 'a map'.

Posted: Wed Apr 18, 2007 8:32 am
by marol
The first pipok's post does not change anything in this matter.

Posted: Wed Apr 18, 2007 8:48 am
by pipok
marol wrote:The first pipok's post does not change anything in this matter.
Right, it doesn't. Relax, I'm not against you, marol. Just a comment.