On January 1, I received a bill from my web hosting provider for a bandwidth overage for $155. I've never had this happen before. For comparison, I pay about $400/year for the hosting service, and usually the limitation is disk space.
Turns out, on December 17, my bandwidth usage jumped dramatically - see the attached graph.
I run a few different sites, but tech support was able to help me narrow it down to one site. This is a hobbyist site, with a small phpBB forum, for a very specific model of motorhome that hasn't been built in 25 years. This is NOT a high traffic site; we might get a new post once a week...when it's busy. I run it on my own dime; there are no ads, no donation links, etc.
Tech support found that AI bots were crawling the site repeatedly. In particular, OpenAI's bot was hitting it extremely hard.
Here's an example: There are about 1,500 attachments to posts (mostly images), totaling about 1.5 GB on the disc. None of these are huge; a few are into the 3-4 megabyte range, probably larger than necessary, but not outrageously large either. The bot pulled 1.5 terabytes on just those pictures. It kept pulling the same pictures repeatedly and only stopped because I locked the site down. This is insane behavior.
I locked down the pictures so you had to be logged in to see them, but the attack continued. This morning I took the site offline to stop the deluge.
My provider recommended implementing Cloudflare, which initially irritated me, until I realized there was a free tier. Cloudflare can block bots, apparently. I'll re-enable the site in a few days after the dust settles.
I contacted OpenAI, arguing with their bot on the site, demanding the bug that caused this be fixed. The bot suggested things like "robots.txt", which I did, but...come on, the bot shouldn't be doing that, and I shouldn't be on the hook to fix their mistake. It's clearly a bug. Eventually the bot gave up talking to me, and an apparent human emailed me with the same info. I replied, trying to tell them that their bot has a bug to cause this. I doubt they care, though.
I also asked for their billing address, so I can send them a bill for the $155 and my consulting fee time. I know it's unlikely I'll ever see a dime. Fortunately my provider said they'd waive the fee as a courtesy, as long as I addressed the issue, but if OpenAI does end up coming through, I'll tell my provider not to waive it. OpenAI is responsible for this and should pay for it.
This incident reinforces all of my beliefs about AI: Use everyone else's resources and take no responsibility for it.
I have experienced something similar. I run a small forum for a computer games series, a series I myself have not been interested in a long time. I am just running it because the community has no other place to go, and they seem to really enjoy it.
A few months ago, I received word from them that the forum barely responded anymore. I checked it out and noticed there were several hundred active connections at any time, something we have never seen before. After checking the whois info on the IPs, I realized they were all connected to meta, google, apple, microsoft and other AI companies.
It felt like a coordinated DDoS attack and certainly had almost the same effect. Now, I have a hosting contract where I pay a flat monthly fee for a complete server and any traffic going through it, so it was not a problem financially speaking, but those AI bots made the server almost unusable. Naturally, I went ahead and blocked all the crawler IPs that I could find, and that relieved the pressure a lot, but I still keep finding new ones.
Fuck all of those companies, fuck the lot of them. All they do is rob and steal and plunder, and leave charred ruins. And for what? Fan fiction. Unbelievable.
Maybe it's time to implement an AI tarpit. Each response for a request from a particular IP address or range takes double the time of the previous, with something like a 30 second cool down window before response time halves.
Would stop AI scrapers in their tracks, but it wouldn't hurt normal users too much.
Maybe I should start looking into it a bit more 🤔
Apparently my phpbb forum served as a nice tar pit. The only thing I can figure is that they neglected to take session IDs into account, so they assumed every url was a different page.
Not an expert or anything but could a script be made that feeds a bot an endless steam of unique tinyurls that points to images openai pays to host?
Thank you for continuing that site. We donate every month for a similar forum that we never use. I love that these places still exist, and appreciate those who help run them.
Could you run a script that presents the AI bots with alternative believable but incorrect text based information? That would be a great way to fight back.
You could even implement an AI to rewrite your content with intentional errors so you don't have to generate the misinformation yourself. Sounds like a great use for AI.
Nepenthes already does a better job of this than what you’re proposing and doesn’t require AI.
https://hackaday.com/2025/01/23/trap-naughty-web-crawlers-in-digestive-juices-with-nepenthes/
Nice