greyfox

joined 2 years ago
[–] greyfox@lemmy.world 1 points 3 days ago* (last edited 3 days ago) (1 children)

Well I am not claiming to be a ZFS expert, I have been using it since 2008ish both personally and professionally. So I am fairly certain what I have said here is correct semantics aside.

Both lz4 and zstd have almost no performance impact on modern hardware.

So "almost" is zero in your mind? Why waste the CPU cycles on compressing data that is already compressed? I recognize that you might not care, but I sure do. And I wouldn't say it would be wrong to think that way.

compression acts on blocks in ZFS, therefore it is enabled at the pool level

This is incorrect. You can zfs set compression=lz4 dataset (or off) on a per dataset basis. You can see your compression efficiency per dataset by running zfs get compressratio dataset, if your blocks were written to a dataset with compress=off you will see no compression for that dataset. You can absolutely mix compressed and uncompressed datasets in the same pool.

OP added a -O option to set compression when he created the pool but that is not a pool level setting. If you look at the documentation for zpool-create you will see that -O are just properties passed to the root dataset verses -o options which are actual pool level parameters.

You might be confusing compression and deduplication. Deduplication is more pool wide.

ZFS does indeed need to allocate some space at the front and end of a pool for slop, metaslab, and metadata. I think you are confusing filesystem and datasets.

Well yes and no here. You are right I should have been calling them datasets. Datasets are a generic term, and there are different dataset types, like file systems, volumes and snapshots.

So yeah I maybe should have been more generic and called them datasets but unless OP is using block volumes we are probably talking about ZFS file systems here. Go to say the zfsprops man page and you will see file system mentioned about 60 times when discussing properties that can be set for file type datasets.

I'm not sure what you're trying to say about NFS and ZFS, here but this is completely false, even if you mean datasets.

It sounds like you are unaware of the native NFS/SMB integrations that ZFS has.

It is totally optional but instead of using your normal /etc/exports to set NFS settings ZFS can dynamically load export settings when your dataset is mounted.

This is done with the sharenfs parameter zfs set sharenfs=<export options> dataset. Doing this means you keep your export settings with your pool instead of the system it is mounted on, that way if you say replicate your pool to another system those export settings automatically come with it.

There are also sharesmb options for samba.

My point was then that you should lay out your dataset hierarchy based on your expected permissions for NFS/SMB. You could certainly skip all of this and handle these exports manually yourself in which case you wouldn't have to worry about separate filesystems and this point is moot.

My post was less about compression and more about saying that you should consider splitting your datasets based on what is in them because the more separate they are the more control you have. You gain a lot of control, and lose very little since it all comes from the same pool.

Some of the reasons I have these options are less important than they were a decade ago. i.e. doing a 20tb ZFS send before resuming a send was possible sucked. Any little problem and you have to start over. Having more smaller filesystems meant smaller sends. And yeah I was using ZFS before lz4 was even an option and CPU was more precious back then, but I don't see any reason to waste CPU cycles when you can create a separate file system for your media and set compression to off on it.

And most importantly I would want different snapshot policies for different data types. I don't need years worth of retention for a movie collection, but I would like to have years worth of retention on my documents filesystem because it is relatively small so the storage consumed is minimal to protect against accidental deletion.

[–] greyfox@lemmy.world 2 points 3 days ago

Yeah it won't make much difference these days.

I suppose my point was more so that because ZFS is a pool that can be split up with filesystems new users should be thinking a little differently than they would have been used to with traditional raid volumes/partitions.

With a normal filesystem partitions are extremely limiting, requiring you to know how much space you need for each partition. ZFS filesystems just being part of the pool means that you can get logical separation between data types without needing that kind of pre-planning.

So many settings with ZFS that you may want to set differently between data types. Compression, export settings, snapshot schedules, replicating particular data sets to other systems, quotas, etc.

So I was mostly just saying "you should consider splitting those up so that you can adjust settings per filesystem that make sense".

There is also a bit of danger with a single ZFS filesystems if you have no snapshots. ZFS being a copy on write filesystem means that even deleting something actually needs space. A bit counter intuitive but deleting something means writing a new block first then updating the FS to point at the new block. If you fill the pool to 100% you can't delete anything to free up space. Your only option is to delete a snapshot or delete entire filesystems to free up a single block so that you can cleanup. If you don't have a snapshot to delete you have to delete the entire filesystem and if you only have one filesystem you need to backup+delete everything... ask me how I know this ;)

If you have several filesystems you only need to backup and destroy the smallest one to get things moving again. Or better yet have some snapshots you can roll off to free up space or have quotas in place so that you don't fill the pool entirely.

[–] greyfox@lemmy.world 2 points 3 days ago (1 children)

Yep you can certainly do it that way.

If your concern is just that you already created it you can shut it off after the fact. All new blocks written will have compression disabled.

[–] greyfox@lemmy.world 1 points 3 days ago (8 children)

You probably shouldn't enable compression on the root filesystem of the pool. Since you mention movies/TV shows/music those are just going to waste cpu cycles compressing uncompressable data.

Instead you should consider separate ZFS filesystems for each data type. Since ZFS is a pool you don't have to pre-allocate space like partitions so there is no harm in having separate filesystems for each data type rather than single large filesystem for everything. You can then turn on compression only for those filesystems that benefit from it.

Also remember that many permissions like nfs export settings are done on a per filesystem basis so you should lay out your filesystems according to your data type and according to what permissions you want to give out for that filesystem.

i.e. if you are going to have a Navidrome server to stream your music you don't want to give that server access to your entire pool, just your music.

Separate filesystems also means you can have different snapshot schedules/retentions. Documents might need to snapshot more often, and be kept around longer than media snapshots.

[–] greyfox@lemmy.world 5 points 2 months ago* (last edited 2 months ago) (1 children)

Same here. The last couple of weeks infinite scroll has been broken with lemmy.world. Tried another instance and it still works fine so I assume it is a lemmy.world problem.

The boost app pops up a timeout when it happens so presumably lemmy.world is dropping connections for some reason.

[–] greyfox@lemmy.world 3 points 2 months ago

Blockchain isn't really a great storage mechanism for large files, it is good for small transactions/records. This is why most NFTs are just links to content instead of storing the content in the blockchain itself. It is possible to store that data in the chain but you usually pay transaction fees based on the size of that data. To store the data in the chain everyone participating in the chain would need a full copy of every video posted which isn't really feasible long term.

It is also nearly impossible to moderate. i.e. how do you remove illegal CSAM content from something that is designed to be immutable.

So storing data like this is just not the right use case for blockchain.

Someone could create a site based on a blockchain where you distribute links to videos, but those links would still need to be stored somewhere else. You really aren't gaining much beyond just hosting the content yourself.

The only valid use case would be if you are attempting to avoid government influence that might try to modify your content later, or able to prove that you are the one posting the content. You could use the blockchain to post a link to a video and the hash of the file to verify it is the correct file, then anyone with access to the chain would have a record of who posted it and a way to validate that the content hasn't been modified.

There are federated options like PeerTube that are close to what you want but without the blockchain issues.

[–] greyfox@lemmy.world 1 points 3 months ago (1 children)

Yeah looks like you are right. Appears that they just got control of it in Feb.

[–] greyfox@lemmy.world 13 points 3 months ago (1 children)

I think the better solution is if the company is so important that it needs to be bailed out, then should just get nationalized when it fails.

Our money goes towards bailing them out, but the public owns it after that. The shareholders that ran it into the ground shouldn't get to keep it.

[–] greyfox@lemmy.world 4 points 3 months ago (5 children)

I thought James Bond is special though? The family still gets to approve the script, so the issue is that Amazon wants to milk it with a crap story, and the family says no this isn't good enough.

[–] greyfox@lemmy.world 7 points 3 months ago

Probably liability issues. Some customer doesn't see it, steps on it, and face plants into the floor then they get sued.

[–] greyfox@lemmy.world 1 points 3 months ago

From an insurance perspective these drugs are one of the largest reasons for premium increases in the last couple of years. The high cost combined with the number of Americans that medically qualify to get these covered (usually requirements are just high BMI or other diabetes risks) has increased insurance costs considerably.

So if they are trying to lower insurance premiums (or keep them in check at least) this is a good way to do it.

From a Medicare perspective losing weight is one of the best preventative things you can do for long term health, so getting these covered by Medicare could easily translate to long term savings.

[–] greyfox@lemmy.world 3 points 3 months ago (4 children)

"Tech companies don't care that students use calculators to cheat"

AI is just a tool like a calculator. No company cares about their employees beyond getting work out of them. If a potential employee shows that they can use the tools at their disposal to get the job done then why would they care?

view more: next ›