In earlier versions of ponysay only the output truncation supported
Universal Character Set, though handcoded UTF-8 character counting. Now
ponysay lets Python decode the data, Python store all 31 bits of a
character in as one character, not in UTF-16 as some other languages does, this
means that the code is agnostic to the character encoding. However in Unicode
6.1 their are four ranges of combining characters, these do not take up any
width in proper terminal, we therefore have a class in the code named UCS
that help us take them into consideration when determine the length of a string.
Some ponies have names that contain non-ASCII characters, read about it in
Environment variables. The UCS names are stored in the file share/ucsmap,
in it lines that are not empty and does not start with a hash (#
) are
parsed, and contains a UCS name and a ASCII:ised name. The UCS name comes first,
followed by the ASCII:ised name that the UCS name should replace or link towards.
The two names are separated by and simple left to right arrow character [U+2192],
optionally with surrounding white space.