Unicode
Unicode is a superset of ASCII.
Tip
Unicode has replaced ASCII in common use.
The format
Written in “U+0048” (U + a 4 digit hexadecimal).
- Though chatGPT now tells me it ranges from
0000
to10FFFF
, because of this BMP plane (basic mention below)
Info from CS138:
- spans more than 100,000 characters over languages both real and fake!
- A Unicode character spans 21 bits and has a range of 0 to 1,114,112 or 3 bytes per character. This last number comes from the 17 planes which Unicode is divided into multiplied by the 216 code points (contiguous block).
- Plane 0 is the BMP (Basic Multilingual Plane)
- Unicode letters also share the same values as ASCII. This was necessary for adoption by the Western World which had ASCII first.
The implementation of Unicode is not defined by Unicode.