this post was submitted on 20 Apr 2026
520 points (99.1% liked)

interestingasfuck

9132 readers
637 users here now

For exceptionally interesting content

Rules:

  1. Posts must be interesting
  2. Posts must be based in reality
  3. No hateful content
  4. No harmful content
  5. Beauty is in the eye of the beholder

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] captain_aggravated@sh.itjust.works 2 points 8 hours ago (1 children)

How do emoji use more data? They're one, maybe two unicode characters?

[โ€“] SkaveRat@discuss.tchncs.de 1 points 6 hours ago (1 children)
[โ€“] captain_aggravated@sh.itjust.works 1 points 6 hours ago (2 children)

Than an entire word?

Take "cactus" for example. Each letter in the word "cactus" is one unicode character, for a total of six. ๐ŸŒต is one unicode character, U+1F335.

Unicode characters are 4 bytes long, so "cactus" takes 24 bytes to transmit, where "๐ŸŒต" takes 4. Unless something something UTF_8?

[โ€“] onlyhalfminotaur@lemmy.world 1 points 5 hours ago* (last edited 5 hours ago) (1 children)

You're close, Unicode characters don't imply a number of bytes, it's how they're encoded that does (utf-8 most commonly). Utf-8 can be as little as one byte or as many as four, depending on the specific character. I don't know about emojis but I imagine they're in the four bytes section. Whereas "asdf" is also four bytes in utf-8.

So I just looked it up, the UTF-8 encoding for the cactus emoji is 4 bytes long: 0xF0 0x9F 0x8C 0xB5

Where the Latin alphabet is in the 1-byte region.

So it takes 6 bytes to transmit "cactus" in UTF-8, and only 4 to transmit โ€œ๐ŸŒตโ€. So any emoji that replaces 5 or more letters is more efficient. ๐Ÿ† breaks even with "dick" or "cock", more efficient than "penis", more than twice as compact as "eggplant" or "aubergine".

[โ€“] SkaveRat@discuss.tchncs.de 1 points 6 hours ago (1 children)

But that's not what people are doing

They always use a word and an emoji

Yes, to be clear I meant the example I gave where the word was replaced with the emoji was compression, not where they give the word and its emoji. That's as long-handed as possible.