CrowdSec Mikrotik Bouncer fails to add banned ip to address list
it mainly works but but every few hours some banned address is not added to address list, until I manually reboot the docker container, then it appears. What could be the culprit?
here's an example of failed add: the event was not logged at all in the mikrotik container, here's an extraction at ban's time
crowdsec_mikrotik | {"level":"info","time":"2025-01-20T13:00:31Z","message":"removed decisions: IP: 164.163.25.225 | Scenario: crowdsecurity/netgear_rce | Duration: -11s | Scope : Ip"}
crowdsec_mikrotik | {"level":"info","time":"2025-01-20T13:00:31Z","message":"164.163.25.225 not find in local cache"}
crowdsec_mikrotik | {"level":"info","time":"2025-01-20T13:39:20Z","message":"new decisions from crowdsec: IP: 118.40.165.223 | Scenario: crowdsecurity/thinkphp-cve-2018-20062 | Duration: 595h59m55s | Scope : Ip"}
crowdsec_mikrotik | {"level":"info","time":"2025-01-20T13:39:20Z","message":"Address 118.40.165.223 already present"}
6744631 │ crowdsec │ Ip:45.130.145.4 │ LePresidente/http-generic-403-bf │ ban │ AE │ 50340 JSC Selectel │ 6 │ 594h27m9s │ 1083
75 Replies
Important Information
Thank you for getting in touch with your support request. To expedite a swift resolution, could you kindly provide the following information? Rest assured, we will respond promptly, and we greatly appreciate your patience. While you wait, please check the links below to see if this issue has been previously addressed. If you have managed to resolve it, please use run the command
/resolve
or press the green resolve button below.Log Files
If you possess any log files that you believe could be beneficial, please include them at this time. By default, CrowdSec logs to /var/log/, where you will discover a corresponding log file for each component.
Guide Followed (CrowdSec Official)
If you have diligently followed one of our guides and hit a roadblock, please share the guide with us. This will help us assess if any adjustments are necessary to assist you further.
Screenshots
Please forward any screenshots depicting errors you encounter. Your visuals will provide us with a clear view of the issues you are facing.
© Created By WhyAydan for CrowdSec ❤️
the problem still persist and present itself every few hours
Do you only have one remediation? and do you generally ban for a very long time?
indeed one remediation and the max time, 596h
So it could be that you already have the Ip locally via the community blocklist https://app.crowdsec.net/cti/45.130.145.4 as it marked as malicious (however I thought mikrotik should block before it gets to application)
And then because a decision is generated after the fact, the lapi doesnt send the new one since it based on longest decisions which CAPI sends a long timer
CrowdSec Console
IP 45.130.145.4
Get detailed threat intelligence for IP 45.130.145.4 on CrowdSec Console. Analyze risks and vulnerabilities to strengthen your defenses against malicious activity.
but if the ip is obtained via the community blocklist, shouldn't be sent anyway to the mikrotik address list? I receive the alert and even after hours if I query that ip on the address list it does not appear. I then reboot the mikrotik container and only then it appears.
The address list is correctly blocked on the mikrotik, if an address is on that list it won't reach the webserver from whom crowdsec reads the logs from
From the remediation side is there anything config that is used to filter decisions from the LAPI?
no custom config on that side from what I can recall, just connected the mikrotik bouncer to crowdsec with api key
unfortunately it's a recurring issue
maybe this is related?
https://github.com/funkolab/cs-mikrotik-bouncer/issues/39
the reboot workaround is not applicable in my envronment, I need a fix or ditch the bouncer at all (as it's not working)
I don't know if it's related, it's a third party remediation so we have no hands experience with it
@looterino the issue is within cs-mikrotik-bouncer and the fact it does not expire items from the cache in the similiar way as mikrotik does - I've added comment in the github issue
franky speaking just ignoring the cache on add() would be sufficient 😄
thank you for joining the conversation @KaszpiR
while I continued to monitor this erratic behaviour, I found that logs were warning me about db settings use_wal set to false. Maybe it's totally unrelated but after setting it to true I didn't have any more problem; altough it's a recent edit (about 10 days) and I'm still manually checking if the ip in every new notification ban is successfully added to mikrotik address list, I will do that for as long as I can to be sure that there will be no more false positive ban notification.
To be more clear: when I received a crowdsec ban notification, sometimes, in what it seems to be a random pattern, the ip address was not added to mikrotik address list and I only could fix that by restarting the cs-mikrotik-bouncer container; after that, every past missing ip was istantly added. This restart workaround was needed even two times a day, everytime I saw that the ban notification ip was not indeed banned.
The cache issue you're referring to is related to this? or maybe there're more overlapping issues?
Generally the issue is with the bouncer itself and the cache in it, because cache never expires, but it should - its bad cache implementation. It really depends on the fact if the ip was banned in the upstream and expired or if the ban time was extended. So for shortly banned ip addresses it will properly add and remove ip, thus evicting cache properly. But if for example the ip was banned and the ban time is extended, then the bouncer will think the ip is already banned and because it does not check the time of the ban. Thus it will not perform action of adding it back. That's why restarting helps.
I will try to fix it in the upcoming weeks, because I'm interested in that bouncer, another solution that I have worked but is causing noticeable cpu usage on mikrotiks ( it does dumb force all-delete and add every 10 min, which is far from ideal)
https://github.com/jellydator/ttlcache this should be easy to implement instead of the current caching, but it will require a bit of refactor in the code. This way ip addresses can be added to the internal cache and will be evicted from time to time, and also it should be checked if the ban time changed, if yes then run an update command on the mikrotik ( i am not aware if just adding the same address with different timeout value would work or if it needs delete/add, but that's just to be verified)
GitHub
GitHub - jellydator/ttlcache: An in-memory cache with item expirati...
An in-memory cache with item expiration and generics - jellydator/ttlcache
in my scenario the ban has the maximum time of 25 days, but when @iiamloz suggested that was maybe the problem I tried to reset to default value of 4 hours, unfortunately to no avail, the problem was still there. Do you confirm that changing use_wal to true has no effect on this matter?
thank you for the fix effort
Do you confirm that changing use_wal to true has no effect on this matter?no idea, but AFAIR use_wal helps in concurrent db updates (sqlite)
I can confirm you that was a desperate and misplaced fling of hope, I just saw the problem occur again
the cache fix seems the only reasonable take on this
ok, I managed to update the logging, and replaced the cache, now I need to switch to much smaller blocklist and fix the logic of populating cache and executing commands on add/remove from the lapi. The code is nowheree to be shared yet
christ, how on earth I remove default subscription
CrowdSec Community Blocklist (Lite)
😅You cannot from the console, but you can update your crowdsec configuration to disable pulling the community blocklist with this option https://docs.crowdsec.net/docs/next/configuration/crowdsec_configuration/#pull
CrowdSec Configuration | CrowdSec
CrowdSec has a main yaml configuration file, usually located in /etc/crowdsec/config.yaml.
sigh,
github.com/go-routeros/routeros/v3
is just awful to use mainly becasue the underlying api is awful
ok I guess I coded something that may be useful (but ugly), with a bit better caching and not cooking mikrotik cpu, and exposing prometheus metrics, ping me if you want it (for now it's not in the container image yet)for sure, how can I test it out? I'm using docker
Ill make an image today and will post it to quay.io
question which architextures do you want?
I currently run the amd64 version of the cs-mikrotik-bouncer container
Ok, Ill do it in the evening in about 4h
@blotus question about how crowdsec golang lib ( https://pkg.go.dev/github.com/crowdsecurity/go-cs-bouncer@v0.0.16#StreamBouncer) especially regargin the incoming decisions to add/remove the address
I see that sometimes new decision comes in and the given ip gets a new TTL (say bumped from 12h to 24h)
I wonder if such event is updated /repeated after certain time? if so how often?
for example if the new decision is to extend ttl for given IP, is it published once per hour or something like that?
It's probably a blocklist refresh with decisions that long: the go library is "dumb" and just gives you whatever what returned by LAPI.
LAPI will return the longest decision in the database for a given
{value,type,scope}
triplet ({1.2.3.4,ban,ip}
for example).
As to how often this can occur, it's hard to say:
- If the user has a local decision for an IP, and create another one for the same IP, you will get the new decision if end time of the new one is after
- Blocklist content from the console have a 24h expiration, and will be refreshed from CAPI when they have less than 2 hours left
- CAPI is pulled every 2 hours, so you may (most likely will) see a bunch of update every 2 hours.
In any case, you will get the "new" decision as soon as you fetch the stream.
Best approach is to just trust LAPI: when you get a new decision, check if it's already in your cache, and if so replace the current one with the new one (because this bouncer only deals with ip/ranges, you can just use the value, there's no conflict risks).sometimes I see:
so there is no change but the decision is set, and I'm not sure if it means 'keep the current ttl for blocked address' or 're-block the address with new ttl' (which effectively means extend ban
in other scenarios I see , same quiestion as above
(sorry I posted before you answered)
ok, so I assume that if new decision comes in it effectively means 'update ip ban ttl now with the provided value'
yes
why I'm asking, when I tried to implement it it works, but the routerOS api is horrible in that case that if I try to add it then it will do it quickly if the ip is not present
if it is already there it will say 'got that address, no update', so I need to look up the id of the blocked address, and lookup is horribly slow (like seconds)
so the best option would be not to make updates if possible and let it expire, so it will be readded with the ttl after some time
though this means there may be a window when the ip is not blocked
(welcome to cache invalidation problem 😄 )
so it will be readded with the ttl after some time
Once you get a decision from LAPI, you will not receive it again unless it gets updated, so you have to apply them when you receive them, otherwise you will miss some
(if i understood what you meant)I see updates in a slightly unexpected pattern: every 1h and then 1h+15min

yeah, that's what I fear of
in that case instead of using in-memory cache it would be better to use redis 😄
I don't think it would change anything ?
if the bouncer stops (and so it loses the in-memory cache), the 1st request made by the library to LAPI will return every active decisions to allow you to get a complete initial state
(and because of that, i'd recommend to clean everything on shutdown anyway if you have some kind of persistent cache)
oh really
so on start it is current state and after that there is a diff only, if I understand it correctly?
yes
hm, I think I may rething the curreent logic of how addresses are processed
(eh some shit panic errors :picardfacepalm: )
sorry for the break, but is that panel built upon prometheus metrics you added? I'm looking forward to implement this in my grafana istance 🤩
Yes
Ok i think i got an idea as described on last post https://forum.mikrotik.com/viewtopic.php?t=204504
In general on new decision event updat existing list in golang memory, create new access-list in mikeotik, add addresses to that new access-list, switch firewall rule to use that new access list, ans let old acess list expire or just wipe it ( drop) if supported.
This way there is no need for any cheks if the ip is already on the mikrotik list
Will have to chexk how much load it generates
this new list updates would not include custom scenario decisions, I got that right?
it would
I guess
ok I did total rewrite, the only thing that is left is to actually add creation of new address list and updating firewall rules to use that new list
Ugh i tried to update firmware onnl my router and it got intk the boot loop, fml
I have a virtualized CHR version of mikrotik router, will your version mantain the compatibility?
I tested it on RouteOS 7.x, over the RouterOS api they provide, and looking at the specs it should be pretty compatible, there is nothing special
in general commands are like:
so they are generic commands
I've discovered issue with connections when the system runs in the loop, when the router died I started to get:
and surprisingly the routeros golang client did not recover from it
@poisynth
- https://quay.io/repository/kaszpir/cs-mikrotik-bouncer?tab=tags mulit-arch image (havent tested it yet, I use repo within LAN)
- https://github.com/nvtkaszpir/cs-mikrotik-bouncer/tree/rewrite-to-access-list-swap repo code
Quay
Quay is the best place to build, store, and distribute your containers. Public repositories are always free.
GitHub
GitHub - nvtkaszpir/cs-mikrotik-bouncer at rewrite-to-access-list-swap
A CrowdSec bouncer for MikroTik RouterOS appliance - GitHub - nvtkaszpir/cs-mikrotik-bouncer at rewrite-to-access-list-swap
sigh, I forgot that sometime VSCode loves not to save commit messages, there WAS a beautiful wall of text there... 🤦♂️
there may be some things missing in the docs, generally the most important thig are env vars, example:
where the last line is important, it defines ip firewall rules to update, just use
/ip firewall filter print
and write down the numbers, assuming 0 is a generic packet mangling rule for counting, 1 is for crowdsec drop input
, 2 is for crowdsec drop filter
treat it as experimental and prone to errors, but please report them 😉 I will be fine tuning the code in the upcoming days
off to bed
though I have very old grafana (v8.5) so not sure if this would work on the newer ones
mikrotik
hAP ax3
cpu load when inserting those new addresses to the new address-list and swapping the firewall rules, overall spikes to about 11% cpu once per hour
I've got an idea to periodically quiery device to list all addresses in address-lists but... this could be misleading if not using filter, and using filter is slow
I'll test that asap, thank you for the effort
3 items out of ... 😅

I've added two options
USE_MAX_TTL
and DEFAULT_TTL_MAX
which should significantly decrease number of ips in address-lists and allow faster expiration of address lists, also did readme overhaulnow that works much better

as I understood this, there's no way to avoid creating those address list right?
nope, it's by design
with USE_MAX_TTL=true and DEFAULT_MAX_TTL=4h it gets about 4 go 5 lists, and the total number of IPs in address-lists is fluctuating around 20k entries
the
DEFAULT_MAX_TTL=4h
is reflecting the default profile ban duration?no, this value was chosen to avoid having too many address lists with too long bans for too long, because only the actively assigned address-list to the firewall rules is actually important
so the idea is like this:
- crowdsec send actual decisions + durations
- when decision comes in then internal cache is checked
- if the item is not in the list, then it is added
- if the item is in the list then it is updated (for example longer/shorter ban time
then after processing all decisions we check if there were any changes (added/removed/modified), in that case there is a processing of the items in the cache and certain actions are performed
when performing iteraion over items in the chache there is a command to add item to the new address list, and if USE_MAX_TTL=true and and if the current item ban in the cache is higher than DEFAULT_MAX_TTL (say 4h), then it the effective command will be modified, so that say instead ofn 3 day ban we give it DEFAULT_MAX_TTL - in our case 4h
so effectively in the cache items have orginal ban duration, but when sending to mikrotik it is shortened to the desired value to allow address-list to expire faster
after adding all addresses to the new address-list, then specific firewall rules are changed to use that new address-list, effectifely still holding the bans
oe hour passes and crowdsec send an update, so anotehr list is created, again with truncated time if needed, and firewall rule is updated using new address-rule
so the current address rule is active, but all previous ones are not, they are not needed anymore, and because they had shortened bans, they will expire by themselfves faster
this helps to avoid having massive amount of address lists, each of them having items with long expire time
hm I think I will have to update the code tomorrow with edge cases, such that if there is no decision over certain time then there will be no update, will have to add periodic insert + fw rule update, which is lower than defalt_max_ttl
havent seen such use case yet, though, there was alsways ip add/remove or time duration change
is there a minimum TTL time?
beside the crowdsec community ip ban updates, I have a custom scenario that can ban other addresses with a custom ban duration, does the new address-list configuration influence those custom ban too?
No, there is not, but I suggest setting it to 2h if you get updates once per hour
If you have custom ban durations they should be respected in the cached list
I may need to add extra check to avoid bans without time limit, because that would cause address list staying forever
added separate loop to update mikrotik, by default it runs every 1h
and
latest
image contains above and extra metric to process perm bans (ttl=0), but not sure if I coded it properly so its as 'fixme'will try it out asap, thank you
ok I made some cleanups in the repo and it's easier to make releases and image builds from tag
GitHub
Release Trigger update if incoming decision is from cscli · nvtkas...
Previously if you used cscli decision add then the code would take some time before actually applying the ban on the MikroTik - by default up to 1h, which was not ideal :).
Now with setting UPDATE_...
now you get bans updated really quicky when you add them via cscli, previosly it lagged noticeably (by default up to 1h), now its in about 5s since cscli command
https://github.com/funkolab/cs-mikrotik-bouncer/issues/86#issuecomment-2927248727 that was a bit unexpected 🙂 i thought that repo as more active
Yeah we know the guy who runs it personally as he is a friend of our C levels.
imo, we should keep both as if we merge your changes into their repo it could disrupte users, however, if there were comms and a redirection then at least users can be informed where to go. We dont have this capability on the console but I pinged the team internally so we are going to make some plans on how we should do:
- Check the repo status if archieved then warn users
- Repo maintainer can update readme to point to a repo, however, we should redirect users to the alternative if it exists in the hub
We talked and I decided to make a standalone fork under different name, it will be way safer
I'm surprised they wanted to archive the repo, but at least the owner is willing to give an idea where to look for an alternative
GitHub
GitHub - nvtkaszpir/cs-mikrotik-bouncer-alt: A CrowdSec bouncer for...
A CrowdSec bouncer for MikroTik RouterOS appliance, alternative - nvtkaszpir/cs-mikrotik-bouncer-alt
Added minor fixes, especially mutex to prevent errors due to race conditions
hm I think i'll need to add an option to insert also egress fw rules, not just ingress, may help for people fighting with c&c botnets
that's amazing, keep it up <3
I havent got any feedbac from anyone...yet. So, any obstacles except life?
Ok , I implemented also updating firewall rules for dst-address-list, not pushed yet to repo because it needs a bit of testing, also there will be a breaking chance in the env var values ( just by adding a suffix to the env var name), and of course I need to do the docs update. In addition i need to add metric for the time the app was in the waiting for a lock state and the time it took to update the mikrotik - now it can be deducted from the logs but it's not convenient. I hope to make it available tomorrow probably.
personally I'm away from home for some time and I still need to test it throughly, I'll gladly give you a feedback
v0.4.0 released, breaking, see the readme
some changes (nothing app specific, so no need for new tag)
- deploy directory for docker-compose and kubernetes
- grafana dashboard + screenshots
- docs update (especially about metrics)
Duuuh I enabled
Issues
in the repo, totally missed the fact it was not enabled by default on GitHub. So now there is a less confusion when raading the readme section about contributing XD
Small feature added - ability to control how frequently process decisions from the streaming local api, default value is 10s and should be enough, but for larger lists and slower devices it may be better to increase it if needed. More in the readme.I will be able to do a dryrun on your version soon, can't wait
ps: on this latest feature, could 10s be lowered?
Yes, down to 1s but default is 5s for a golang bouncer client, though I would not really suggest to do it lower than your average time spent on updating mikrotik devices.
If you get for example an update every 5s and your time to update mikrotik is 10s then there will be a noticeable amount of time to wait for an update, and this is blocking other updates,so they can start to stack. Also every update generates a new almost full address list, so if you have a lot of updates it can start to waste device resources such as memory.
New version on the way and will support firewall raw blocking, which may be more performant in certain scenarios, it will need to change some env var names, the rest is as is. Probably will release it tomorrow, for now im testing the image.
GitHub
Release Breaking: Support Firewall Raw · nvtkaszpir/cs-mikrotik-bo...
Previously only firewall filter was available and used, now this version allows to use firewall raw, which helps to reduce load of cpu/memory, especially on low resource devices. It implements #2
B...
awesome change to use raw table!
docs refactor, now they are available under https://nvtkaszpir.github.io/cs-mikrotik-bouncer-alt/
v0.8.1 - just a small patch https://github.com/nvtkaszpir/cs-mikrotik-bouncer-alt/releases/tag/v0.8.1
GitHub
Release Propagate all metrics on the start · nvtkaszpir/cs-mikroti...
Quick patch to always propagate metrics on start with zero values, helps to avoid null values in grafana dashboards.
if you are on v0.8.0 no need to update