overview for Kryesh

githubDownDaily in c/programmer_humor@programming.dev

[–] Kryesh@lemmy.world 12 points 1 month ago

https://damrnelson.github.io/github-historical-uptime/

Jellyfin Buffering Slow Torrents in c/selfhosted@lemmy.world

[–] Kryesh@lemmy.world 1 points 1 year ago (1 children)

What are the disks and how full is the pool?

Advice on a project idea for learning Rust! in c/programming@programming.dev

[–] Kryesh@lemmy.world 7 points 1 year ago* (last edited 1 year ago)

The biggest thing imo is to have the program do something you want/need, without that motivation becomes difficult and having a problem to solve gives you something to design for. I've been working away at my own side project to learn rust for a while building a log search server Crystalline

I ended up using Poem for the HTTP server, Leptos + Tailwind/DaisyUI for the frontend, and Tantivy for the main data storage

Building my own log aggregation and search server in c/selfhosted@lemmy.world

[–] Kryesh@lemmy.world 1 points 2 years ago

I'm currently using the fluentbit http output plugin, fluentbit can act as an otel collector with an input plugin which could then be routed to the http output plugin. Long term I'll probably look at adding it but there's other features that take priority in the app itself such as scheduled searching and notifications/alerting

Building my own log aggregation and search server in c/selfhosted@lemmy.world

[–] Kryesh@lemmy.world 2 points 2 years ago* (last edited 2 years ago) (1 children)

Applications like metrics because they're good for doing statistics so you can figure out things like "is this endpoint slow" or "how much traffic is there"

Security teams like logs because they answer questions like "who logged in to this host between these times?" Or "when did we receive a weird looking http request", basically any time you want to find specific details about a single event logs are typically better; and threat hunting does a lot of analysis on specific one time events.

Logs are also helpful when troubleshooting, metrics can tell you there's a problem but in my experience you'll often need logs to actually find out what the problem is so you can fix it.

Building my own log aggregation and search server in c/selfhosted@lemmy.world

[–] Kryesh@lemmy.world 1 points 2 years ago (2 children)

Thanks! definitely aiming for a stupid easy installation/management for the app itself; but in my experience getting a wide range of supported log sources is no small feat. I've been using fluentbit to handle collection from different sources and using the following has been working well for me:

docker 'journald' log driver
fluentbit 'systemd' input
fluentbit 'http' output like the one in the readme

with that setup you can search for container logs by name which works great with compose:

or process logs from an nginx container like this to see traffic from external hosts:

I'll add a more complete example to the docs, but if you look in the repo there's a complete example for receiving and ingesting syslog that you can run with just "docker compose up"

Building my own log aggregation and search server in c/selfhosted@lemmy.world

[–] Kryesh@lemmy.world 2 points 2 years ago* (last edited 2 years ago)

Oh I wasn't using it as a full recursive resolver - just reading the resolv.conf set by docker and sending requests

Building my own log aggregation and search server in c/selfhosted@lemmy.world

[–] Kryesh@lemmy.world 3 points 2 years ago (2 children)

More good points, thank you! for trust-dns-resolver that's a relic from a previous iteration that had polling external sources and needed to resolve dns records. Since i haven't gotten around to re-implementing that feature it should be removed. As for why - I actually needed to bring my own resolver since the docker container is a scratch image containing only some base directories and the server binary so there isn't any OS etc to lean on for things like dns; means that the whole image is ~15.5MB which is nice and negates a whole class of vulnerabilities.

Understood that your actual point is to document this stuff and not answer the trivia question though

Building my own log aggregation and search server in c/selfhosted@lemmy.world

[–] Kryesh@lemmy.world 10 points 2 years ago (4 children)

Thanks! it's definitely got a way to go before it's remotely competitive with any of the enterprise solutions out there, but you make a good point about having comparisons so I'll look at adding it.

I'm basically building it to have a KQL/LogScale/Splunk/Sumologic style search experience while being trivial to deploy (relative to others at least...) since I miss having that kind of search tooling when not at work; but I don't want to pay for or maintain that kind of thing in a lab context. It creates a Tantivy index per day for log storage (with scoring and postings disabled for space savings).

In the end my main goal of the project was as a vehicle to get better at programming with, and if I get a tool I can use for my lab then that's great too lol.

133

Building my own log aggregation and search server (lemmy.world)

submitted 2 years ago* (last edited 2 years ago) by Kryesh@lemmy.world to c/selfhosted@lemmy.world

19 comments fedilink

Hi everyone, I've been building my own log search server because I wasn't satisfied with any of the alternatives out there and wanted a project to learn rust with. It still needs a ton of work but wanted to share what I've built so far.

The repo is up here: https://codeberg.org/Kryesh/crystalline

and i've started putting together some documentation here: https://kryesh.codeberg.page/crystalline/

There's a lot of features I plan to add to it but I'm curious to hear what people think and if there's anything you'd like to see out of a project like this.

Some examples from my lab environment:

events view searching for SSH logins from systemd journals and syslog events:

counting raw event size for all indices:

performance is looking pretty decent so far, and it can be configured to not be too much of a resource hog depending on use case, some numbers from my test install:

raw events ingested: ~52 million
raw event size: ~40GB
on disk size: ~5.8GB

Ram usage:

not running searches ingesting 600MB-1GB per day it uses about 500MB of ram
running the ssh search examples above brings it to about 600MB of ram while the search is running
running last example search getting the size of all events (requires decompressing the entire event store) peaked at about 3.5GB of ram usage