this post was submitted on 20 Apr 2026
524 points (99.1% liked)
interestingasfuck
9145 readers
544 users here now
For exceptionally interesting content
Rules:
- Posts must be interesting
- Posts must be based in reality
- No hateful content
- No harmful content
- Beauty is in the eye of the beholder
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You're close, Unicode characters don't imply a number of bytes, it's how they're encoded that does (utf-8 most commonly). Utf-8 can be as little as one byte or as many as four, depending on the specific character. I don't know about emojis but I imagine they're in the four bytes section. Whereas "asdf" is also four bytes in utf-8.
So I just looked it up, the UTF-8 encoding for the cactus emoji is 4 bytes long: 0xF0 0x9F 0x8C 0xB5
Where the Latin alphabet is in the 1-byte region.
So it takes 6 bytes to transmit "cactus" in UTF-8, and only 4 to transmit โ๐ตโ. So any emoji that replaces 5 or more letters is more efficient. ๐ breaks even with "dick" or "cock", more efficient than "penis", more than twice as compact as "eggplant" or "aubergine".