this post was submitted on 30 Dec 2025

3 points (80.0% liked)

Secure Coms

92 readers

6 users here now

This is a community for enthusiest who love to ponder new ways for Alice to communicate with Bob in a world where global passive adversarys probably record every bit that ever crosses the wire.

Discuss cryptography, secure key exchange, private messangers, radios, encoding, networking tools, authentication mechanisms and anything relevant to coming up for ways to Alice to get a message to Bob.

founded 4 months ago

MODERATORS

cm0002@programming.dev

hereforawhile@programming.dev

Huge Computer Assisted Codebook (gitlab.com)

submitted 2 months ago by cm0002@no.lastname.nz to c/securecoms@programming.dev

12 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] jasory@programming.dev 1 points 2 months ago (1 children)

What motivated you to write this program?

Your choice of "codebook", is an immediate red flag and reeks of pop-crypto. There is a reason why this approach was abandoned some 100+ years ago, even properly implemented they have severe shortcomings.

[–] hereforawhile@programming.dev 1 points 2 months ago* (last edited 2 months ago) (1 children)

What motivated you to write this program?

Just for fun basically.

I've had the idea for awhile but the problem is was always a huge amount of grunt work to get the initial database created. With the use of LLM I basically mined all the unique entries, common phrases.

I'm not claiming it's the best or anything at all. But for codebook standards...I tried to implement all the things that would make a good code book.

Ability to say the same thing over and over and make it look different for mitigation against frequency analysis.
Easy, secure, shuffling
customizable
Assisted composing
Exportable
Long term rotating key schema
Conclusive and established database
Portable

[–] jasory@programming.dev 1 points 2 months ago (1 children)

Why did you use an LLM for the frequency tables? The "most common words used" is very useful data and as such there are many already existing compilations, used by things like spell checkers. The Linux system dictionaries are one example.

The fact that you completely ignore that simply using a larger RSA key would both be faster and more secure than your approach, doesn't inspire confidence either.

(It's also in python which is basically unusable. )

[–] hereforawhile@programming.dev 1 points 1 month ago (1 children)

I used a LLM to create my database because it is not only a collection of words, but common phrases. Plus not only can the LLM format the database how I want it so it's interpretable to the program, it can build the database and included all the appropriate amount of duplicates.

The fact that you completely ignore that simply using a larger RSA key would both be faster and more secure than your approach, doesn't inspire confidence either.

The goal was to not use any modern crypto... Codebooks have been used for a very long time and are secure with proper key management.

This is an attempt at a modern codebook. It tackles most all of the shortcomings of previous iterations.

(It's also in python which is basically unusable. )

Haha.

[–] jasory@programming.dev 1 points 1 month ago (1 children)

"but common phrases". These also exist, they are used in grammar checkers. They also exist in texts for English learners.

Datasets like these are very easy to come by. In fact you could actually write a program that set up a Markov matrix of pairs of words for any input text, and use it to determine common phrases. This is the standard sloppy approach, a more clever one would restrict the pairing to grammatically valid ones.

[–] hereforawhile@programming.dev 1 points 1 month ago (1 children)

I mean what's the real point you are arguing? I'm happy to include other datasets in the master database. A bigger database is no problem for this schema or SQLite limitations.

The LLM produced all these things with one or two prompts and they are all grammatically valid... It's just what I happened to source the initial data set from.

[–] jasory@programming.dev 1 points 1 month ago (1 children)

My point is that your approach is awful. It's like you completely fumbled into your idea, and you're trying to sell it as superior to rigorously constructed cryptosystems ( nearly all exploits are due to developer incompetence not cryptographers).

"They are all grammatically valid"- yeah you have no idea what I just said. I was talking about constructing a probability matrix from a language, if you restrict the entries to grammatically valid pairs/tuples it reduces the size and is therefore easier to compute. Whether or not your ciphertext is grammatically valid English has zero effect on its strength.

The reason why you might want to take the approach I described is that you can make precise claims about the dataset and final result. Rather than saying "umm ... Chatgpt said so..".

Regardless, this has nothing to do with cryptographic security. It's just an immediate red flag when developers miss obvious solutions.

[–] hereforawhile@programming.dev 1 points 1 month ago (1 children)

I'm not selling anything though...it's completely free? What are you talking about.

The reason why you might want to take the approach I described is that you can make precise claims about the dataset and final result. Rather than saying "umm ... Chatgpt said so.."

You realize it's just a database file that you can look at right? You dont think I've looked at the database?

It's a modern implementation of a ancient form of secure communication that has been used for 1000's of years supercharged by a computer. Not sure why you are so triggered. It does exactly what I'm claiming it does.

[–] jasory@programming.dev 1 points 1 month ago* (last edited 1 month ago) (1 children)

Again you have no idea what I'm talking about. I'm not a cryptographer, but I've done a bit of computational number theory (a strongly related field), I speak from that field.

I'm describing how you make rigorous provable claims, rather than "it's an ancient technique supercharged by a computer"- which quite frankly makes you sound like a child.

"It's just a database file"- You realize the contents of the database don't matter? The security comes from the mapping. The database contents could be pure nonsense symbols (in fact it should be, one of the criteria of modern cryptosystems is indistinguishability from random data, because it defeats pattern-based attacks.) I was simply pointing out that your approach was very amateurish since those databases have already existed for 30+ years.

"Not selling anything"

"Selling" is a common synonym for "convincing" or "advertising". Saying "I sold him on that"- means you convinced someone.

I'm going to let this discourse die because it seems like you don't understand what I'm saying.

[–] hereforawhile@programming.dev 1 points 1 month ago* (last edited 1 month ago)

I do want to reply because I think my claims are reasonable.

The only actual cryptographic function for the schema is the secrets.randbelow(). Scrutinize this function if you don't think it can achieve what I am claiming it can.

The randomize function takes each ID and assigns it a new integer. This is taking entropy at the OS level. There are no seed values used here. It's never going to repeat in a billion years. Because there are 2million+ entries, the amount of possibilities are essentially limitless. You could stack 1 petabyte drives across our entire universe and still would not be able to capture every possible state.

This function is highly documented and (as far as I know) is the one of the best available CSPRNG you can actually utilize on a device.

Here is an example of the raw shuffle map that is generated.

Before the shuffle map is loaded, if you query your word, your going to get the raw unshuffled associated message ID.

Once a shuffle map is generated and loaded into the program the query is simply looking for the new CSPRNG assigned integer.

The shuffle map can now be considered the key. Because this is a pure lookup table, there is no algorithm to attack aside from guessing how my exact device generated the shuffle map in it's exact moment of existence....that's where the strength of this schema lies.

Thanks for the discourse I've enjoyed the pushback despite we can't agree.

Edit*

Take a look at the new pack62 compression though!