There’s really compelling open source models like Zonos coming out; ElevenLabs will need to figure out how to thread the needle to keep everyone happy while other solutions eat into the pie.
chiisana
If you can serve content locally without tunnel (ie no CGNAT or port block by ISP), you can configure your server to respond only to cloudflare IP range and your intranet IP range; slap on the Cloudflare origin cert for your domain, and trust it for local traffic; enable orange cloud; and tada. Access from anywhere without VPN; externally encrypted between user <> cloudflare and cloudflare <> your service; internally encrypted between user <> service; and only internally, or someone via cloudflare can access it. You can still put the zero trust SSO on your subdomain so Cloudflare authenticates all users before proxying the actual request.
Yep! Give granite a try. I think that would be perfect for this use case both in terms of able to answer your queries and doing them quickly, without a GPU by just using modern CPU. I was getting above 30 tokens per second on my 10th gen i5, which kind of blew my mind.
Thinking models like r1 will be better at things like troubleshooting a faulty furnace, or user problems, so there’s benefits in pushing those envelopes. However, if all you need is to give basic instructions, have it infer your intent, and finally perform the desired tasks, then smaller mixture of experts models should be passable even without a GPU.
Deepseek referred here seems to be v3, not r1. While the linked article didn’t seem to have info on parameter size, fact that they state it is sparse MoE architecture should suggest it is capable to run pretty quick (compared to other models of similar parameter space), so that’s cool.
Depending on what you want to do with it, and what your expectations are; the smaller distilled versions could work on CPU, but most likely will need extra help on top, just like other similar sized models.
This being a reasoning model, you might get a more well thought out results out of it, but at the end of the day, smaller parameter space (easiest to think as ‘less vocabulary’), smaller capabilities.
If you just want something to very quickly chat back and forth with on a CPU, try IBM’s granite3.1-moe:3b, which is very fast even on a modern CPU, but doesn’t really excel in complex problems without additional support (ie: RAG or tool use).
Shipping address appears to be US only. Oh well.
More than once I’ve heard the jokingly saying that ‘everything causes cancer in the state of California’ (regardless if they bore the warning label or not). I think while the intention may be good, the equivalent of notification fatigue is at play here and might not be delivering intended benefit/value.
I wonder if it’s more because they’re hitting capacity limits as result of physical limitations of memory on package design — physical distance resulting in potentially unbalanced performance due to some memory simply doesn’t have physical space that could deliver consistent performance, thus limiting capacity as an idea that crossed my mind.
So less so of a “it’ll be more performant” thing, but “this was great but we have to back paddle because we need more memory than the space could permit” kind of thing?
Yeah. We came from a time of incandescent light bulbs taking 60W per bulb with fixtures needing 2-3 bulbs. Turning those off regularly mattered. The obsession people have with turning their modern electronics off in the name of power savings is silly if not outright insane.
Electronics components do not like to have power states change frequently. Turning devices on and off frequently will decrease lifespan of device. Sure, you are saving money on your electricity bill, but at some point, the savings and environmental impacts are outweighed by the cost of the device/parts and the impact during manufacturing.
Also, don’t forget phantom draws from the power supplier is a real thing, which will most likely exceed your 5 zeros threshold. So that microwave oven, and laundry dryer? Don’t forget to unplug those after each use.
Works very well on vanilla docker compose as well. Annotate containers with labels and it will wire itself up automagically. It’s wonderful.