this post was submitted on 08 Nov 2024
325 points (97.9% liked)

Selfhosted

60253 readers
546 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Detailed Rules Post

  1. Be civil.

  2. No spam.

  3. Posts are to be related to self-hosting.

  4. Don't duplicate the full text of your blog or readme if you're providing a link.

  5. Submission headline should match the article title.

  6. No trolling.

  7. Promotion posts require active participation, with an account that is at least 30 days old. F/LOSS without a paywall has exceptions, with requirements. See the rules link for details.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

I saw this post and I was curious what was out there.

https://neuromatch.social/@jonny/113444325077647843

Id like to put my lab servers to work archiving US federal data thats likely to get pulled - climate and biomed data seems mostly likely. The most obvious strategy to me seems like setting up mirror torrents on academictorrents. Anyone compiling a list of at-risk data yet?

you are viewing a single comment's thread
view the rest of the comments
[–] yasser_kaddoura@lemmy.world 32 points 2 years ago* (last edited 2 years ago) (3 children)

I have a script that archives to:

I used to solely depend on archive.org, but after the recent attacks, I expanded my options.

Script: https://gist.github.com/YasserKa/9a02bc50e75e7239f6f0c8f04fe4cfb1

EDIT: Added script. Note that the script doesn't include archiving to archivebox, since its API isn't available in stable verison yet. You can add a function depending on your setup. Personally, I am depending on Caddy and docker, so I am using caddy module [1] to execute commands with this in my Caddyfile:

route /add {
	@params query url=*
	exec docker exec --user=archivebox archivebox archivebox add {http.request.uri.query.url} {
		timeout 0
	}
}

[1] https://github.com/abiosoft/caddy-exec

[–] WhyJiffie@sh.itjust.works 4 points 2 years ago (1 children)

isn't this prone to a

 || rm -rf /

or something similar at the end of the URL?

if you can docker exec, you have a lot of privileges already, so be sure to make sure this is not a danger

[–] yasser_kaddoura@lemmy.world 2 points 2 years ago* (last edited 2 years ago)

Thank you for the warning. You are correct. It's prune to command injection. I will validate the URL before executing it. This shoud suffice until archivebox's rest API is available in stable.

[–] opulentocean@lemm.ee 2 points 2 years ago (1 children)

Would you be willing to share it?

[–] Appoxo@lemmy.dbzer0.com 1 points 2 years ago* (last edited 2 years ago)

I hope you are also donating to the projects for uploading multiple copies to different services.