Char Encoding

ASCII was standard for representing characters using 7-bits. It represented mostly unaccented English characters and some other special characters as well. Other countries used variations or created their own systems that were massively incompatible. This started to become a major issue with the rise of the world wide web. Computers couldn’t communicate and display text properly.

Unicode arose out of this. The organization that manages this took all unique characters from all languages and created a character table. Unicode can store potentially over a million characters although ~109,000 had been stored as of Unicode 6.0.

UTF-8 is an encoding system that was devised to map this character set to the bytes in a computer. The good thing about UTF-8 was that it used 1 byte when the mapping was the same as ASCII. This allowed for an easier transition and backwards compatibility.