Unicode - Don't trust your eyes

This should be nothing new to you when I say "Don't trust your eyes".
But specifically when it comes to Unicode, I feel like saying it again: "Really don't".

Unicode - Don't trust your eyes

This short Python code tries to make a point:
The two strings "Unicode" are equal, but the following two strings are not, though they look alike.

Actually the first 口 is a normal Chinese character meaning mouth, the second ⼝ is its radical form.

There are many characters in Unicode that look alike, several dots, look-alike characters from the roman alphabet for IPA and especially many for the CJK block in Unicode, not only for radicals but many coming from the so called "source separation".

Here's the full code:

>>> u'Unicode' == u'Unicode'
>>> u'口' == u'⼝'
>>> ord(u'口')
>>> ord(u'⼝')