CrowdSec•8mo ago

Hardware Performance Guidance and Bottleneck Debugging

Background I'm currently running CS in docker containers on a Raspberry Pi 5. Rough overview: * CS primary container - CAPI, LAPI, used by nginx bouncer (remote on another machine) * CS secondary container - Child log processor acquiring from nginx docker source (remote on another machine) * 9 collections installed * 10 parsers (http-logs, nginx-logs, some enrich, non-syslogs, whitelists...) CS works fine for my setup. 99% of the time it's using less than 1% cpu. My server averages 0-30 req/s ingress. But every once in awhile I do something that gets a ton of ingress (mastodon post) and have 1000+ req/s at which point CS takes my Pi5 to 100% cpu usage. It looks to be about 50/50 split between secondary container and primary (for bouncing, I'd guess). Problem I've looked through the docs and done some thorough googling. * CS Docs recommend ARM, 1 cpu core, 100 ram minimum but this is clearly not sufficient for large loads. * Config docs specify some routines settings but other than this one blog post its not clear what the recommended values for these should be based on hardware or load There is very little documentation on how to scale hardware based on expected load. There is, as far as I can tell, no documentation on how to troubleshoot bottlenecks or otherwise get insight into what would need to be tweaked in order to get better performance. Can the CS team provide a bit of insight on these topics?

21 Replies

CrowdSec•8mo ago

Important Information

Thank you for getting in touch with your support request. To expedite a swift resolution, could you kindly provide the following information? Rest assured, we will respond promptly, and we greatly appreciate your patience. While you wait, please check the links below to see if this issue has been previously addressed. If you have managed to resolve it, please use run the command /resolve or press the green resolve button below.

Log Files

If you possess any log files that you believe could be beneficial, please include them at this time. By default, CrowdSec logs to /var/log/, where you will discover a corresponding log file for each component.

Guide Followed (CrowdSec Official)

If you have diligently followed one of our guides and hit a roadblock, please share the guide with us. This will help us assess if any adjustments are necessary to assist you further.

Screenshots

Please forward any screenshots depicting errors you encounter. Your visuals will provide us with a clear view of the issues you are facing.

GNU Plus Windows User•8mo ago

There's a few things you can try to reduce CPU usage: - Use the firewall bouncer over the NGINX bouncer, it'll be a fair bit lighter since the bad IPs will be blocked at the network level, reducing the amount of logs CrowdSec has to parse, and for NGINX having to process those requests. - Remove any collections, parsers, and scenarios you don't need, the geoip parser in particular is quite heavy - Enable RE2 feature flag for a big reduction in CPU at the cost of more RAM usage https://docs.crowdsec.net/docs/next/configuration/feature_flags#list-of-available-feature-flags

Feature Flags | CrowdSec

In order to make it easier for users to test and experiment with new features, CrowdSec uses the concept of "feature flags".

FoxxMDOP•8mo ago

Thanks for the feedback. I'll remove geoip, I had a hunch that was costly and I'm only using it for nice looking notifications to discord. Firewall bouncer isn't an option, unfortunately, because 1) the burst of requests are all valid (normal operation of mastodon when posting to a highly-followed user) and 2) I'm using cloudflare tunnels and CS recommends not using the firewall bouncer on the free tier

Enable RE2 feature flag

From the docs and cscli output I was under the impression this was deprecated and will be removed?

GNU Plus Windows User•8mo ago

Firewall bouncer isn't an option, unfortunately, because 1) the burst of requests are all valid (normal operation of mastodon when posting to a highly-followed user) and 2) I'm using cloudflare tunnels and CS recommends not using the firewall bouncer on the free tier

You can try the Cloudflare bouncer instead, but it'll be heavily rate limited. Not sure how much help that'll be.

From the docs and cscli output I was under the impression this was deprecated and will be removed?

I think it's going to be enabled by default at some point, but I'm personally using that flag and it had a nice impact on performance

FoxxMDOP•8mo ago

Thanks for the clarification. Do you have any guidance on tuning routines settings? Also I realize (I think) you are a power user and not a dev so getting info on hardware scaling may be out of scope. Is there somewhere better I can post this question to get dev attention?

GNU Plus Windows User•8mo ago

that's for the AppSec WAF, are you using that?

Also I realize (I think) you are a power user and not a dev so getting info on hardware scaling may be out of scope. Is there somewhere better I can post this question to get dev attention?

A dev should get to you if your still having issues but I've been using CrowdSec for years so I'm fairly experienced

FoxxMDOP•8mo ago

I am not using appsec yet. I didn't realize those settings were specific to that feature. What has been your personal experience with hardware requirements for the peak loads I have experienced?

GNU Plus Windows User•8mo ago

fyi, AppSec is pretty heavy so you might want to offload the WAF functionality to Cloudflare. CrowdSec is pretty heavy since it makes heavy use of regex, so I wouldn't expect a raspberry pi to handle it well except for a low traffic home server. 1000+ requests a second would probably be a bit too much if you can offload some of those requests to Cloudflare via caching, that'll be a big help.

FoxxMDOP•8mo ago

Gotcha. I'll enable re2 and remove geo-ip but it sounds like I should migrate it to a beefier server I have in my lab.

GNU Plus Windows User•8mo ago

@FoxxMD https://github.com/crowdsecurity/crowdsec-docs/issues/735 fyi should be better documented in the future

GitHub

Add a dedicated section on performance tuning · Issue #735 · crowds...

Performance advice is currently scattered all around the docs, some not at all making it difficult to better tune your CrowdSec installation for performance.

iiamloz•8mo ago

hey 👋 the biggest cpu cost is this one scenario https://app.crowdsec.net/hub/author/crowdsecurity/scenarios/http-bad-user-agent so you shouldnt have to remove the geoip module as that should be pretty light @FoxxMD but yeah it sounds good to do, just dont know exactly where it would end up.

CrowdSec Console

Hub configuration

Use CrowdSec Console to visualize security data, manage dynamic blocklists, and gain real-time intelligence on IPs. Enhance your threat response capabilities.

FoxxMDOP•8mo ago

Thanks for the feedback @iiamloz . That is present in my config, I'll figure out what collection is including it and remove it. Glad to know I can keep geoip as that is nice info to get pinged with

iiamloz•8mo ago

For your one container you can add DISABLE_SCENARIOS environment and for bare metal just run cscli scenarios remove 😄

FoxxMDOP•8mo ago

oh that's dope! I see that now in docker_start.sh I've been having to dig into the source code to find all these extra, useful ENVs. Wish they were in docs 🥲

GitHub

crowdsec/docker/docker_start.sh at master · crowdsecurity/crowdsec

CrowdSec - the open-source and participative security solution offering crowdsourced protection against malicious IPs and access to the most advanced real-world CTI. - crowdsecurity/crowdsec

iiamloz•8mo ago

they should be on docker readme, but its not the best place to find them https://hub.docker.com/r/crowdsecurity/crowdsec

FoxxMDOP•8mo ago

ah i see that now, thanks for the tip I moved CS to a more powerful machine (10 cores on an i5-13400), enabled re2, and disbaled user agent matching. ofc the better machine helped and my issues are functionally resolved. i did reply to daniel stenberg (curl owner) which netted ~4000 requests over about 10 seconds and it still saturated all 10 cores but at least my systems and web server didn't freeze this time 😵‍💫 marking this as solved.

GNU Plus Windows User•8mo ago

another idea just came to mind, are you running a reverse proxy, are you parsing the logs on both your reverse proxy and webserver, or just the reverse proxy?

FoxxMDOP•8mo ago

Running a reverse proxy. I've offloaded the log parsing to a separate log processor. The reverse proxy uses bouncer from a primary CS instance not processing any logs. Generally:

GNU Plus Windows User•8mo ago

but are you parsing both the web server logs (backend) and reverse proxy, or just your reverse proxy?

FoxxMDOP•8mo ago

I'm not sure what the difference is, in this case. this traefek instance only serves external requests, from cloudflare tunnel. requests are written to access.log oh actually this is nginx 😂 but the setup is entirely the same. im in the middle of migrating to traefik but external is still all through nginx oh i think i understand what you mean . reverse proxy -> routes to -> service a -> logs -> logs Cs is only processing logs from the reverse proxy

GNU Plus Windows User•8mo ago

aah good, because if you were parsing both on the web server and reverse proxy you'd be parsing the same logs twice

Gaming

Programming

Hardware Performance Guidance and Bottleneck Debugging

Did you find this page helpful?