this post was submitted on 18 Mar 2025
48 points (94.4% liked)
Privacy
36021 readers
10 users here now
A place to discuss privacy and freedom in the digital world.
Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.
In this community everyone is welcome to post links and discuss topics related to privacy.
Some Rules
- Posting a link to a website containing tracking isn't great, if contents of the website are behind a paywall maybe copy them into the post
- Don't promote proprietary software
- Try to keep things on topic
- If you have a question, please try searching for previous discussions, maybe it has already been answered
- Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
- Be nice :)
Related communities
much thanks to @gary_host_laptop for the logo design :)
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I mean, yeah? This isn't a bug, this is just the consequence of how you have it setup. You're telling your browser to check this file with (likely) 100,000+ entries in it on each page load. If this is something you'd like to do, then you should be running AdGuard Home or PiHole. Using a hosts file directly is a really bad idea.
unless they use a computer from the 80's, there's no reason a large hosts file should slow down programs that bad.
yeah. this is a bug.
Yep, precisely.
It's also quite literally one of the recommended methods of installation for e.g. UHB, for which there's even a pre-made script in the repo.
Edit: Also, Chromium devs are aware of this use case and have even added optimizations for it in the past, as visible in the highlighted comment. And the max hosts file size defaults to 32 MiB which is well over the size I'm using (24 MiB). Makes it even weirder for it to bog down completely when experimenting with a ~250 MiB hosts file, as it should just reject it outright according to implementation.
TLDR: looks like you're right, although Chrome shouldn't be struggling with that amount of hosts to chug through. This ended up being an interesting rabbit hole.
My home network already uses unbound with proper blocklist configured, but I can't use the same setup directly with my work computer as the VPN sets it's own DNS. I can only override this with a local resolver on the work laptop, and I'd really like to get by with just
systemd-resolved
instead of having to adddnsmasq
or similar for this. None of the other tools I use struggle with this setup, as they use the system IP stack.Might well be that chromium has a bit more sophisticated a network stack (than just using the system provided libraries), and I remember the docs indicating something about that being the case. In any way, it's not like the code is (or should be) paging through the whole file every time there's a query – either it forwards it to another resolver, or does it locally, but in any case there will be a cache. That cache will then end up being those queried domains in order of access, after which having a long
/etc/hosts
won't matter. Worst case scenario after paging in the hosts file initially is 3-5 ms (per query) for comparing through the 100k-700k lines before hitting a wall, and that only needs to happen once regardless of where the actual resolving takes place. At a glance chrome net stack should cache queries into the hosts file as well. So at the very least it doesn't really make sense for it to struggle for 5-10 seconds on every consecutive refresh of the page with a warm DNS cache in memory......or that's how it should happen. Your comment inspired me to test it a bit more, and lo: after trying out a hosts file with 10 000 000 bogus entries chrome was brought completely to it's knees. However, that amount of string comparisons is absolutely nothing in practice – Python with its measly linked lists and slow interpreter manages comparing against every row in 300 ms, a crude C implementation manages it in 23 ms (approx. 2 ms with 1 million rows, both a lot more than what I have appended to the hosts file). So the file being long should have nothing to do with it unless there's something very wrong with the implementation. Comparing against
/etc/hosts
should be cheap as it doesn't support wildcard entires – as such the comparisons are just simple 1:1 check against first matching row. I'll continue investigating and see if there's a quick change to be made in how the hosts are read in. Fixing this shouldn't cause any issues for other use cases from what I see.For reference, if you want to check the performance for 10 million comparisons on your own hardware:
Nice deep dive.
I would have assumed the hosts file got cached, indexed and re-read if the file changes. Surely it's not read and parsed for every single hostname lookup.
My adblock list is in BIND9 anyway, so I don't get this issue. I can see it definitely takes a second or two to parse the whole list on startup.
Don't seem to be any disk reads on request at a glance, though that might just be due to read caching on OS level. There's a spike on first page refresh/load after dropping the read cache, so that could indicate reading the file in every time there's a fresh page load. Would have to open the browser with call tracing to be sure, which I'll probably try out later today.
For my other devices I use unbound hosted on the router, so this is the first time encountering said issue for me as well.
You're using software to do something it wasn't designed to do. So this comment is beyond meaningless. There's no value whatsoever in it.
So then why would you even think to do something like this? Like....why?
well if you would bother to read what they have written.. oh I see, then you couldn't be so condescending
As such, Chrome isn't exactly following the best practices either – if you want to reinvent the wheel at least improve upon the original instead of making it run worse. True, it's not the intended method of use, but resource-wise it shouldn't cause issues – at this point one would've needed active work to make it run this poorly.
As I said, due to company VPN enforcing their own DNS for intranet resources etc. Technically I could override it with a single rule in configuration, but this would also technically be a breach of guidelines as opposed to the more moderate rules-lawyery approach I attempt here.
If it was up to me the employer should just add some blocklist to their own forwarder for the benefit of everyone working there...
But guess I'll settle for local dnsmasq on the laptop for now. Thanks for the discussion 👌🏼