CrowdSec•4mo ago

Heavy load on mariadb

Hello, I came across a similar topic on the forum and I'm currently experiencing the same issue. (https://discourse.crowdsec.net/t/heavy-load-on-mariadb-multi-servers-environment/1738) MariaDB is consuming all available CPU resources (I’ve tested with 8, then 12, and even 16 vCores), especially during SQL query execution. I also ran MySQLTuner, but the issue still persists. Is there currently any known solution or recommended approach to optimize the configuration and reduce this high CPU usage? Result of mysqltunner : [!!] Slow queries: 9% (68K/736K) [!!] Joins performed without indexes: 196741 Thanks in advance for your help!

CrowdSec

Heavy load on mariadb - multi servers environment

Hi all, I have 1x LAPI server with ~15x crowdsec servers. The mariadb server encounter a permanent heavy load due to mariadb process. With 4 vcpu and 12 GB RAM, the load is ~20 and the %cpu is 400%. Is it normal ? I suppose no of course 😔 I did not find tuning guide of mysql/mariadb from crowdsec (indexing some fields ?, etc …) Here is ...

57 Replies

CrowdSec•4mo ago

Important Information

Thank you for getting in touch with your support request. To expedite a swift resolution, could you kindly provide the following information? Rest assured, we will respond promptly, and we greatly appreciate your patience. While you wait, please check the links below to see if this issue has been previously addressed. If you have managed to resolve it, please use run the command /resolve or press the green resolve button below.

Log Files

If you possess any log files that you believe could be beneficial, please include them at this time. By default, CrowdSec logs to /var/log/, where you will discover a corresponding log file for each component.

Guide Followed (CrowdSec Official)

If you have diligently followed one of our guides and hit a roadblock, please share the guide with us. This will help us assess if any adjustments are necessary to assist you further.

Screenshots

Please forward any screenshots depicting errors you encounter. Your visuals will provide us with a clear view of the issues you are facing.

blotus•4mo ago

How many bouncers / log processors do you have ?

ClashplayerOP•4mo ago

Yes, I have several servers that go back to the main server

blotus•4mo ago

How many ? Also, how many decisions/alerts do you have in the database ? (you can get the count with cscli decisions list --all | wc -l and cscli alerts list --all | wc -l, or directly from the DB by running SELECT count(*) FROM alerts and SELECT count(*) from decisions, which will likely be a bit faster).

ClashplayerOP•4mo ago

ALERTS : 45866 DECISIONS : 62264

blotus•4mo ago

that seems reasonnable Is the load constant or not ? If it is constant, it could be because of the bouncers querying LAPI to get the decisions If it spikes every ~2h or so, then it's when refreshing the blocklists content If you could also try to log some of the slow queries so we know where to look, it would be useful

ClashplayerOP•4mo ago

This seems to be very constant

ClashplayerOP•4mo ago

https://imgur.com/a/wJvyJv8

Imgur

Untitled Album

blotus•4mo ago

what's the output of cscli bouncers list ?

ClashplayerOP•4mo ago

Only my Python bouncer

blotus•4mo ago

what's the refresh interval of the bouncer ?

ClashplayerOP•4mo ago

20 s from memory

blotus•4mo ago

hmmm that's really weird then you'll need to log some slow queries, a single bouncer querying every 20s should not put that much load on the DB

ClashplayerOP•4mo ago

Exemple :

SELECT `alerts`.`id`, `alerts`.`created_at`, `alerts`.`updated_at`, `alerts`.`scenario`, `alerts`.`bucket_id`, `alerts`.`message`, `alerts`.`events_count`, `alerts`.`started_at`, `alerts`.`stopped_at`, `alerts`.`source_ip`, `alerts`.`source_range`, `alerts`.`source_as_number`, `alerts`.`source_as_name`, `alerts`.`source_country`, `alerts`.`source_latitude`, `alerts`.`source_longitude`, `alerts`.`source_scope`, `alerts`.`source_value`, `alerts`.`capacity`, `alerts`.`leak_speed`, `alerts`.`scenario_version`, `alerts`.`scenario_hash`, `alerts`.`simulated`, `alerts`.`uuid`, `alerts`.`remediation`, `alerts`.`machine_alerts` FROM `alerts` WHERE (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`until` >= '2025-06-13 11:02:32.386973016') AND (((NOT (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`origin` = 'CAPI'))) AND (NOT (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`origin` = 'lists')))) AND (NOT ((NOT (EXISTS (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `alerts`.`id` = `decisions`.`alert_decisions`))) AND (`alerts`.`source_scope` LIKE 'lists:%' OR `alerts`.`source_scope` = 'crowdsecurity/community-blocklist'))))) AND (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`start_ip` <= -XXX) AND `alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`end_ip` >= -XXX) AND `alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`ip_size` = 4)) ORDER BY `alerts`.`created_at` DESC, `alerts`.`id` DESC LIMIT 100;

SELECT `alerts`.`id`, `alerts`.`created_at`, `alerts`.`updated_at`, `alerts`.`scenario`, `alerts`.`bucket_id`, `alerts`.`message`, `alerts`.`events_count`, `alerts`.`started_at`, `alerts`.`stopped_at`, `alerts`.`source_ip`, `alerts`.`source_range`, `alerts`.`source_as_number`, `alerts`.`source_as_name`, `alerts`.`source_country`, `alerts`.`source_latitude`, `alerts`.`source_longitude`, `alerts`.`source_scope`, `alerts`.`source_value`, `alerts`.`capacity`, `alerts`.`leak_speed`, `alerts`.`scenario_version`, `alerts`.`scenario_hash`, `alerts`.`simulated`, `alerts`.`uuid`, `alerts`.`remediation`, `alerts`.`machine_alerts` FROM `alerts` WHERE (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`until` >= '2025-06-13 11:02:32.386973016') AND (((NOT (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`origin` = 'CAPI'))) AND (NOT (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`origin` = 'lists')))) AND (NOT ((NOT (EXISTS (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `alerts`.`id` = `decisions`.`alert_decisions`))) AND (`alerts`.`source_scope` LIKE 'lists:%' OR `alerts`.`source_scope` = 'crowdsecurity/community-blocklist'))))) AND (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`start_ip` <= -XXX) AND `alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`end_ip` >= -XXX) AND `alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`ip_size` = 4)) ORDER BY `alerts`.`created_at` DESC, `alerts`.`id` DESC LIMIT 100;

Interval between last API pulls : 2025-06-16T12:00:23Z 2025-06-16T12:00:57Z

blotus•4mo ago

That looks like either a call to cscli alerts list --all or the periodic flush crowdsec performs in the database (which is 5k alerts/7 days by default). Can you show me the content of /etc/crowdsec/config.yaml ?

ClashplayerOP•4mo ago

flush: max_items: 251000 max_age: 7d 😅 Sometimes, I go over, so I increase.

blotus•4mo ago

even 250k alerts should be easily manageable with a single bouncer how fast is the storage of the mysql server ? that's the only thing that comes to mind to confirm it's the flush, you can try to disable it by setting max_items and max_age to 0 and restart crowdsec, the load should go down

ClashplayerOP•4mo ago

IDK

blotus•4mo ago

is it HDD or SSD ?

ClashplayerOP•4mo ago

SSD If I do it, won't it clear the database ? -------- Performance Metrics ----------------------------------------------------------------------- [--] Up for: 1h 49m 31s (1M q [181.455 qps], 2K conn, TX: 3G, RX: 493M) [--] Reads / Writes: 86% / 14% [--] Binary logging is disabled [--] Physical Memory : 7.8G [--] Max MySQL memory : 3.8G [--] Other process memory: 0B [--] Total buffers: 801.0M global + 2.9M per thread (1000 max threads) [--] Performance_schema Max memory usage: 149M [--] Galera GCache Max memory usage: 0B [OK] Maximum reached memory usage: 1.2G (15.83% of installed RAM) [OK] Maximum possible memory usage: 3.8G (49.04% of installed RAM) [OK] Overall possible memory usage with other process is compatible with memory available [!!] Slow queries: 9% (111K/1M) [OK] Highest usage of available connections: 10% (104/1000) [OK] Aborted connections: 0.00% (0/2765) [OK] Query cache is disabled by default due to mutex contention on multiprocessor machines. [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 10K sorts) [!!] Joins performed without indexes: 322193 [OK] Temporary tables created on disk: 0% (862 on disk / 1M total) [OK] Thread cache hit rate: 96% (107 created / 2K connections) [OK] Table cache hit rate: 99% (3M hits / 3M requests) [OK] table_definition_cache (400) is greater than number of tables (304) [OK] Open file limit used: 0% (62/32K) [OK] Table locks acquired immediately: 100% (1K immediate / 1K locks)

blotus•4mo ago

no, we use the 0 value to check if we should do the flush or not: https://github.com/crowdsecurity/crowdsec/blob/master/pkg/database/flush.go#L253

GitHub

crowdsec/pkg/database/flush.go at master · crowdsecurity/crowdsec

CrowdSec - the open-source and participative security solution offering crowdsourced protection against malicious IPs and access to the most advanced real-world CTI. - crowdsecurity/crowdsec

ClashplayerOP•4mo ago

I have just setup at 0, the service no longer wants to launch.

blotus•4mo ago

Which error do you get ? that's not related I think

ClashplayerOP•4mo ago

Jun 16 14:23:31 xxx systemd[1]: Stopped crowdsec.service - Crowdsec agent. Jun 16 14:23:31 xxx systemd[1]: Starting crowdsec.service - Crowdsec agent... Jun 16 14:23:31 xxx crowdsec[30334]: FATAL api server init: unable to run local API: max_items can't be zero or negative

blotus•4mo ago

there's a small inconsistency: we do not allow max_items to be 0 when checking the config

ClashplayerOP•4mo ago

Jun 16 14:25:31 xxx systemd[1]: Starting crowdsec.service - Crowdsec agent... Jun 16 14:25:31 xxx crowdsec[30375]: FATAL api server init: unable to run local API: max_items can't be zero or negative Jun 16 14:25:31 xxx mariadbd[1027]: 2025-06-16 14:25:31 3433 [Warning] Aborted connection 3433 to db: ‘xxx’ user: ‘xxx’ host: ‘xxx’ (Got an error reading communication packets) Jun 16 14:25:31 xxx mariadbd[1027]: 2025-06-16 14:25:31 3432 [Warning] Aborted connection 3432 to db: ‘xxx’ user: ‘xx’ host: 'xxx' (Got an error reading communication packets) Jun 16 14:25:31 xxxx mariadbd[1027]: 2025-06-16 14:25:31 3431 [Warning] Aborted connection 3431 to db: ‘xxx’ user: ‘xx’x host: ‘xxx’ (Got an error reading communication packets) Jun 16 14:25:31 xxx systemd[1]: crowdsec.service: Control process exited, code=exited, status=1/FAILURE Jun 16 14:25:31 xxxx systemd[1]: crowdsec.service: Failed with result 'exit-code'. Jun 16 14:25:31 xxx systemd[1]: Failed to start crowdsec.service - Crowdsec agent. Should I put 1?

blotus•4mo ago

no, it will then just keep a single alert

ClashplayerOP•4mo ago

Okay

blotus•4mo ago

just comment the entire flush section, it will also disable it

ClashplayerOP•4mo ago

Okay good !

Jun 16 14:30:39 xxx  systemd[1]: Starting crowdsec.service - Crowdsec agent...
Jun 16 14:30:40 xxx mariadbd[1027]: 2025-06-16 14:30:40 3451 [Warning] Aborted connection 3451 to db: 'xxx' user: 'xxxx' host: 'xxxx' (Got an error reading communication packets)
Jun 16 14:30:40 xxx mariadbd[1027]: 2025-06-16 14:30:40 3449 [Warning] Aborted connection 3449 to db: 'xxx' user: 'xxxx' host: 'xxxx' (Got an error reading communication packets)
Jun 16 14:30:40 xxx mariadbd[1027]: 2025-06-16 14:30:40 3450 [Warning] Aborted connection 3450 to db: 'xxx' user: 'xxxx' host: 'xxxx' (Got an error reading communication packets)
Jun 16 14:30:43 xxx systemd[1]: Started crowdsec.service - Crowdsec agent.

Jun 16 14:30:39 xxx  systemd[1]: Starting crowdsec.service - Crowdsec agent...
Jun 16 14:30:40 xxx mariadbd[1027]: 2025-06-16 14:30:40 3451 [Warning] Aborted connection 3451 to db: 'xxx' user: 'xxxx' host: 'xxxx' (Got an error reading communication packets)
Jun 16 14:30:40 xxx mariadbd[1027]: 2025-06-16 14:30:40 3449 [Warning] Aborted connection 3449 to db: 'xxx' user: 'xxxx' host: 'xxxx' (Got an error reading communication packets)
Jun 16 14:30:40 xxx mariadbd[1027]: 2025-06-16 14:30:40 3450 [Warning] Aborted connection 3450 to db: 'xxx' user: 'xxxx' host: 'xxxx' (Got an error reading communication packets)
Jun 16 14:30:43 xxx systemd[1]: Started crowdsec.service - Crowdsec agent.

blotus•4mo ago

crowdsec seems to have issue establishing the connection how's the load on the mysql server ?

ClashplayerOP•4mo ago

1449 % 😅 I'll restart it After a restart still not

ClashplayerOP•4mo ago

blotus•4mo ago

same slow queries ?

ClashplayerOP•4mo ago

[!!] Slow queries: 6% (2K/38K) Could it be from my /v1/allowlists/check/X.X.X.X checks? Because some machines I scan with honeypot so I check that ipv4 is not already banned

blotus•4mo ago

no those only use the allowlists tables, and the queries are fairly simple (i haven't tested them with mysql, but I can't imagine them causing those kind of issues)

ClashplayerOP•4mo ago

Hmm slow-query.log :

# User@Host: xxx[xxx] @  [xxxx]
# Thread_id: 160  Schema: xxx  QC_hit: No
# Query_time: 7.461675  Lock_time: 0.000086  Rows_sent: 2  Rows_examined: 482330
# Rows_affected: 0  Bytes_sent: 0
SET timestamp=1749812553;
SELECT `alerts`.`id`, `alerts`.`created_at`, `alerts`.`updated_at`, `alerts`.`scenario`, `alerts`.`bucket_id`, `alerts`.`message`, `alerts`.`events_count`, `alerts`.`started_at`, `alerts`.`stopped_at`, `alerts`.`source_ip`, `alerts`.`source_range`, `alerts`.`source_as_number`, `alerts`.`source_as_name`, `alerts`.`source_country`, `alerts`.`source_latitude`, `alerts`.`source_longitude`, `alerts`.`source_scope`, `alerts`.`source_value`, `alerts`.`capacity`, `alerts`.`leak_speed`, `alerts`.`scenario_version`, `alerts`.`scenario_hash`, `alerts`.`simulated`, `alerts`.`uuid`, `alerts`.`remediation`, `alerts`.`machine_alerts` FROM `alerts` WHERE (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`until` >= '2025-06-13 11:02:32.386973016') AND (((NOT (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`origin` = 'CAPI'))) AND (NOT (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`origin` = 'lists')))) AND (NOT ((NOT (EXISTS (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `alerts`.`id` = `decisions`.`alert_decisions`))) AND (`alerts`.`source_scope` LIKE 'lists:%' OR `alerts`.`source_scope` = 'crowdsecurity/community-blocklist'))))) AND (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`start_ip` <= -xxx) AND `alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`end_ip` >= -xxx) AND `alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`ip_size` = 4)) ORDER BY `alerts`.`created_at` DESC, `alerts`.`id` DESC LIMIT 100;

# User@Host: xxx[xxx] @  [xxxx]
# Thread_id: 160  Schema: xxx  QC_hit: No
# Query_time: 7.461675  Lock_time: 0.000086  Rows_sent: 2  Rows_examined: 482330
# Rows_affected: 0  Bytes_sent: 0
SET timestamp=1749812553;
SELECT `alerts`.`id`, `alerts`.`created_at`, `alerts`.`updated_at`, `alerts`.`scenario`, `alerts`.`bucket_id`, `alerts`.`message`, `alerts`.`events_count`, `alerts`.`started_at`, `alerts`.`stopped_at`, `alerts`.`source_ip`, `alerts`.`source_range`, `alerts`.`source_as_number`, `alerts`.`source_as_name`, `alerts`.`source_country`, `alerts`.`source_latitude`, `alerts`.`source_longitude`, `alerts`.`source_scope`, `alerts`.`source_value`, `alerts`.`capacity`, `alerts`.`leak_speed`, `alerts`.`scenario_version`, `alerts`.`scenario_hash`, `alerts`.`simulated`, `alerts`.`uuid`, `alerts`.`remediation`, `alerts`.`machine_alerts` FROM `alerts` WHERE (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`until` >= '2025-06-13 11:02:32.386973016') AND (((NOT (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`origin` = 'CAPI'))) AND (NOT (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`origin` = 'lists')))) AND (NOT ((NOT (EXISTS (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `alerts`.`id` = `decisions`.`alert_decisions`))) AND (`alerts`.`source_scope` LIKE 'lists:%' OR `alerts`.`source_scope` = 'crowdsecurity/community-blocklist'))))) AND (`alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`start_ip` <= -xxx) AND `alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`end_ip` >= -xxx) AND `alerts`.`id` IN (SELECT `decisions`.`alert_decisions` FROM `decisions` WHERE `decisions`.`ip_size` = 4)) ORDER BY `alerts`.`created_at` DESC, `alerts`.`id` DESC LIMIT 100;

blotus•4mo ago

can you try to stop the bouncer ?

ClashplayerOP•4mo ago

That cuts the bgp sessions...

blotus•4mo ago

that's not ideal 😄 then, just increase the refresh interval to something very high, like an hour

ClashplayerOP•4mo ago

Hmmm Do you think the bouncer could be the cause of the problem?

blotus•4mo ago

I'm not sure A bouncer query should not use the alerts table, but because the flush is disabled, I don't see anything else that could make those kind of queries which bouncer exactly are you using ? And what's your crowdsec version ?

ClashplayerOP•4mo ago

python-bouncer/0.0.5 version: v1.6.8-debian-pragmatic-amd64-f209766e I only include decisions in my script

blotus•4mo ago

can you share your script ?

ClashplayerOP•4mo ago

Yes, only as a private message

blotus•4mo ago

sure

ClashplayerOP•4mo ago

It's sent !

blotus•4mo ago

i'm really at a loss We know that mysql is not always happy with our queries and is likely the least performant database we support, but it still shouldn't behave like this

ClashplayerOP•4mo ago

Hmm Why doesn't CrowdSec perform indexing?

blotus•4mo ago

we do have indexes but we are probably missing some

ClashplayerOP•4mo ago

OK, so at the moment there's no solution other than waiting?

blotus•4mo ago

you could try switching to postgres or sqlite and see how it behaves note that we do not provide a migration script between databases, so you'll probably need to copy some things by hand

ClashplayerOP•4mo ago

For me, that would be getting round the problem, you'd have to understand why...

blotus•4mo ago

yes but that's the best I can offer for now going over all queries to make sure we have indexes everywhere need is going to take some time (and adding indexes without any planning can have a negative impact)

_KaszpiR_•4mo ago

what's the max open connections to that db instance (as graph) and what do you have set in the config?

ClashplayerOP•4mo ago

Hello @KaszpiR, I currently allow 1,000 connections, and there are 102 threads connected So I finally solved my problem. If you're experiencing the same issues I did, here's the behavior that was causing it: I had a script that checked whether an IPv4 address was banned using the command: cscli decisions --ip IP -o json I’ll admit, this command was running continuously in my process. It was checking millions of addresses, so after a minute or so, it would eventually overwhelm the system. The solution was simply to switch to an HTTP request, which is much less resource-intensive. Thanks to @blotus for guiding me through this issue.

AlexD•4mo ago

Looking at the slow-query.log, the DB engine had to scan about 500K rows in the DB to return just 2 rows. There are definitely some indices missing. Run EXPLAIN on that query to find out. Also, subqueries, like "IN (SELECT ... ) ", are performance killers, they probably caused multiple full-table scans.

Gaming

Programming

Heavy load on mariadb

Did you find this page helpful?