this post was submitted on 19 Nov 2025
69 points (96.0% liked)

Data Hoarder

747 readers
95 users here now

Keep it about datahoarding.

Rules

founded 2 years ago
MODERATORS
 

I've been running OCR on the recent house epstein email dump. Making this available now that its close to finishing (20k/ 23k emails processed).

Processing script available here: https://codeberg.org/sillyhonu/Image_OCR_Processing_Epstein

I also put an analysis script in there if you want to use drive/ colab.

Currently finished files are available here:

https://files.catbox.moe/xrgts0.sqlite

you are viewing a single comment's thread
view the rest of the comments
[–] Typotyper@sh.itjust.works 2 points 10 hours ago (1 children)

Why are all the emails from epstien and very few to him. Is this what Congress is holding back

[–] TropicalDingdong@lemmy.world 1 points 3 hours ago (1 children)

That's all combined. To from cc etc..

It's his emails account. So yeah.. he's in all of them.

[–] Typotyper@sh.itjust.works 1 points 3 hours ago (1 children)

Maybe I'm missing something. In the bottom left graph there are 2200+ emails "from" him and only ~150 "to" him.

He should have an a lot more "to" him

[–] TropicalDingdong@lemmy.world 1 points 2 hours ago* (last edited 2 hours ago) (1 children)

Oh.

Yeah that might be a formatting artifact. Or it might speak to the fact that we just receive far more than we send.

Many of the emails are digests or new articles. Like the NYT might send out a headlines email. And you just receive it, and aren't going to respond, so it only gets a "from"; no "to".

There is a lot of just... Crap in there. At least two partial books. Random stuff from forums and threads.

[–] Typotyper@sh.itjust.works 1 points 2 hours ago (1 children)

For emails I respond to there are roughly equal numbers. For emails I send people I deal with there are roughly equal numbers. Some businesses ignore me, so those would skew things.

Maybe things will balance out once they release them all

I seriously expect the Epstein files to JFK assassination level conspiracies which linger for decades

[–] TropicalDingdong@lemmy.world 1 points 2 hours ago

I doubt you actually do, and I doubt most people do. The vast vast majority of email is sent and never even read.

You are only thinking about email used as direct correspondence,.but how many random mass mailer emails have landed in your in boxes today? 10s? Hundreds?

I have another figure I can send you, but let me get some coffee in me. It's a frequency analysis in time.