Why did you use an LLM for the frequency tables? The "most common words used" is very useful data and as such there are many already existing compilations, used by things like spell checkers. The Linux system dictionaries are one example.
The fact that you completely ignore that simply using a larger RSA key would both be faster and more secure than your approach, doesn't inspire confidence either.
(It's also in python which is basically unusable. )
What motivated you to write this program?
Your choice of "codebook", is an immediate red flag and reeks of pop-crypto. There is a reason why this approach was abandoned some 100+ years ago, even properly implemented they have severe shortcomings.
Chatgpt just cribs from stack overflow, which in turn just cribs their answers from documentation. Once you figure that out, they both become surprisingly useless.
I just want tropical/desert textures. I know penguins like the cold, but people like variety.
"Making frequency analysis ineffective"
Oh boy, let's hope nobody uses it for large plain texts. If x maps to k1,K2,... then one simply needs enough instances of x to reconstruct the key. It must at the very minimum need multiple symbols to map to the same strings to achieve ambiguity.
The cryptographic claims seem laughable.
Maybe to match against the passport photos. Some people travel with other similar-looking people's IDs, and it can be missed by inspectors. So having an current photo of the traveler can be used in post-hoc investigations to determine if they did so.
The bizarre thing is that they are only analysing something like 40 programs/library. You could reach the same conclusion clicking through their gitlab for a few minutes.
The translation rate is the actually interesting part.
My usual defeater goes like this:
It is logically possible to create a very clever mechanical device that for any finite input would output exactly the same behaviour a human could, yet we would clearly consider it to be unconscious clockwork. We have no reason to believe that electrical computers are any different.
I've used ffmpeg to compose a whole video from screen recordings. The big advantage is that it works well on weaker hardware that can't run full blown video editors.
You're correct in your assessment of the worst-case of distro maintainers, however many distro developers/maintainers do contribute to the upstream ( Debian policy explicitly encourages it, I only speak for Debian because that's the only project I've worked in) and do vet and understand the software.
"It can't be better". Except distro maintainers can block it from being included if they find errors. As noted above they also often file pull requests against the upstream. This happens a fair amount actually.