Bazzite losing/dropping received packets to local networks that ubuntu wasn't?
So I recently swapped from Ubuntu to Bazzite (mostly because of snap finally enraging me for the last time).
However, one thing I've been noticing is that for some reason Bazzite has been dropping packets- a lot of packets- received from some, but not all, local addresses.
I THINK it might be a problem with how it's trying to route things, but I can't be sure, hence my question here.
We have 4 devices in our network (there's a lot more than that, but this is a minimal example)- a router (192.168.1.1), our PC (192.168.1.244) and our destination address, a docker/portainer server (192.168.4.0/24, with the portainer server itself as 192.168.4.1/32)
Pings from the PC to the router work fine, and vice versa. Pings from docker server to the router and PC work fine.
But pings, ssh, and all general traffic from the PC to the docker server (and everything within it) are weird.
If I set up a TCPdump on both the PC and the docker server, I can see my packets going out from the PC, received by the server, and the server responding.
However, the TCPdump on the PC never sees the responding packets.
I THINK there's a weird routing bug here-
route -n
gives:
Is it trying to route anything that is 192.168.0.0/16 EXCEPT 192.168.1.0/24 over the router and then ditching the results because it's coming back not through the router or something?
Even more interesting, a ping from the PC to the docker server has 100% packet loss... right up until it shows this sort of message:265 Replies
Meaning for some reason bazzite has to keep getting told that the server is local, which works for a little while until it aparently 'forgets' and needs to wait for another ICMP_REDIRECT.
Any ideas what's going on? It seems to be bazzite-specific, when I dualboot to windows or Ubuntu on the same machine, it doesn't happen, and no other machine on the network has any sort of problem with this.
That IP routing table is entirely normal.
It has 1 route that matches everything (0.0.0.0) which is correctly sent to your local router at 192.168.1.1
The other route means that anything that matches the 192.168.1.0/24 subnet is routed locally.
Shouldn't 192.168.0.0/16 be routed locally?
No, because your subnet is 192.168.1.0/24 by default
the problem is with anything not in 192.168.1.0/24, and only on bazzite
that's because anything not on 192.168.1.0/24 is sent to your router (gateway)
everything within that is directly sent to your network interface and operates on layer 2
that's just your LAN
You must see that table as: anything that's not inside your LAN (192.168.1.1 through 192.168.1.254) is sent to your router (gateway) which is then routed to your ISP for further hopping until it reaches its destination
destination 0.0.0.0 means everything
It's not so much the ip routing table that's making me suspicious, it's the fact that it all fails, and only for bazzite, until the router says 'hey, just do this locally'
and then it works until it's forgotten
its possible, is it a realtek ethernet adapter?
they are prone to problems
Are you on a big corporate network or something that 192.168.4.1 pings actually work?
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 16)though like I said, if I dualboot over to ubuntu 24.04, it works fine
since that's not your LAN im a little surprised
yes, 192.168.4.1 is part of the network
wait you're using a /16 subnet?
and pings- and all traffic- normally work for it
that's such a horrible idea for performance reasons
has never been a problem for anything else that isn't bazzite. This network has been set up like this for years
do you have more than 254 devices on your subnet?
yes
why no router in between?
that saves your network
sorry im a network engineer, but more than 254 devices on a single subnet is a horrible idea
I also work in networking, and that's why there's that many devices
i have a server rack for skunkworks
then why no subnetting?
why put everything in a single subnet
you know what happens when 1 device starts to broadcast on that subnet
in any case, that's why things are broken out- 192.168.1.0/24 is 'home' stuff, 192.168.4.0/24 is services, 192.168.16.0/24 is vpn
it clogs your bandwidth
etc etc
ohhhh
gotcha
no that's fine
you had me confused for a moment
I figured lol
earlier you said you had everything on /16
I was like 'yyyyess? That's why there IS subnets, that's kind of the whole problem right now...'
if you're getting packet loss within your LAN, I would look where the packets are being dropped
Ah, I was being general when I said "Shouldn't 192.168.0.0/16 be routed locally?"
as in 'shouldn't local packets be routed locally if at all possible'
kernel routing tables work on layer 3, so it needs to route it as layer 3 to the NIC
which they can be, as evidenced by the ICMP_REDIRECT
when the NIC receives the packet, it goes layer 2
so that's why it seems like there's a route when in fact there's none
kernel just otherwise doesn't know where to send the packet
and to which interface
192.168.1.0 0.0.0.0 255.255.255.0 U 100 0 0 enp6s0 just means send it to this interface
after that its up to the NIC to do what it wants
the route above means send everything that ISN'T 192.168.1/24 to the gateway through interface X
yeah, I know
there's only one NIC anyway
anyway, as far as I can tell, the packets are being lost at the bazzite PC
they look like they're being passed through the router just fine
granted, the tcpdump interface on edgeOS is kind of crappy
TTL?
wait not possible unless more complex network with multiple routes
yeah, I wouldn't think it could be ttl, it wouldn't work on this pc no matter what os it was running in that case
if its ttl a route is being looped
unless you got like 10 routers hooked up on each other
nah, the setup is pretty simply
ER-X router has 3 copper- one towards the cable modem, another to the cell modem (dual wan) and a third towards the 48p switch
for now, at least, everything is off the switch
everything on a single switch?
yeah
might wanna check power consumption and bandwidth on that thing
most of the devices are either from the AP, or from the proxmox or portainer servers
power consumption is good, bandwidth isn't even close
i think bandwidth is about 4-5MBps
well within it's capabilities
I try to keep a quiet network
how's packet loss on just normal internet traffic?
1.1.1.1
it helps that all my cloud IOT devices have been evicted, local-only stuff. They're much quieter š
no packet loss or issues out to the internet at all
:huh:
and within subnet?
also none?
yep, no issues there
even to the dns server that's dual-homed
then i would check the router
are you using like routerOS or something?
nope, standard EdgeOS on an ER-X
tcpdump says it's getting passed along
PC doesn't see any of it
normally I would say that indicates something with the router, since it would be pre-firewall on the router and pre-firewall on the PC
but only traffic coming through that router is seeing packet loss š¤
but that wouldn't make sense for it to only happen on one specific OS
same router also does the dual wan?
yeah
or is that on the switch
nah, on the router
maybe windows has bigger timeout values
I'm not this advanced in specific NIC stuff
only routing and switching
are you sure that right now using this topology you get 0 packet loss in Windows?
and ubuntu, yeah
triple boot, have a windows SSD, an unbuntu SSD, and a bazzite SSD.
is it a 2.5G?
the bazzite install is recent
or 1G?
I know Realtek has some probs with 2.5G in Linux still
sometimes manual negotitation to 1G fixes all issues
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 16)
Oh its Gigabit
What version of Ubuntu?
25.04?
24.04 LTS iirc
25.04 has same major kernel version as we right now
6.14
Could you try booting up a fedora 42 livecd
and testing the same pings
also has 6.14
also just confirmed with ethtool, enp6s0 is 1000Mb/s, full duplex
Yeah, I scrolled up and saw Gigabit. It's not these NICs that had issues AFAIK
its the 2.5Gbit ones
they are pretty common on the new AMD and Intel boards
2.5Gb was always such a weird slot to me.
just do 10gb.
or lagg 4x1gb
Just telling you right now: realtek+linux=you don't know what's going to happen

Meme exists for a reason
./shrug had this mobo for a few years now, never had any problems with it yet
knocks on wood
livecd is writing
Now THAT is interesting
0% packet loss
But!
Constant icmp redirects

that means your routing tables are f'd
Sorry for taking a picture of the screen, but it's a livecd so I have no way of transferring a screencap
which is also why internet traffic works fine
you need to find the issue in your internal routing table in your edgeos
or are you just doing interface routing?
Why would every other device not have this issue if it's router, any ideas?
Also any other tests while I'm still in the livecd before I swap back?
an unnecessary redirect is usually a sign of routing confusion
by the router
completely pointless redirect
also make sure every nic has the right properties
subnet, ip address, gateway
None of that's changed recently, only new/changed device in the network is the PC swapping to bazzite
i think if you were to disable icmp redirects, your routes would no longer work
but that would be the best way to figure out what is wrong
Yeah it might be a kernel thing
I'm not sure
But you should definitely try to fix your redirects
sorry, had to load back into main pc
yeah, I'm checking routing
nothing jumping out at me
hail mary time- power cycling the router
(if this works I'm going to be confused)
ICMP redirect messages are almost always generated when a packet is sent to the router but is sent back over the same interface
yep, I know
router will then tell you using ICMP redirect that it was pointless to send it to the router
but it shouldn't really affect too much
but maybe kernel bug
oof that was depressing
for a moment that actually looked like that somehow fixed it lol
and then right as I was about to type in here '??????'
suddenly it started failing
could you try rebasing to testing on bazzite?
it has a minor kernel bump maybe it helps
6.14.4
one sec, lemme look up how to do that
I'm new to bazzite
bazzite-rollback-helper rebase testing
wait
maybe its a better idea
to rollback to 6.13
what does rpm-ostree status
say?
can probably get rid of that printer driver, it's now attached via USB to the printserver
Wait, you're still on 41?
Is there a reason?
I didn't know there was another update?
ujust update
might have to do rpm-ostree reset
firstas I said, new to bazzite
that removes layered packages
No problem š
do the reset before the update to prevent bootc going into limbo, especially on major version lmao
so it looks like Discover takes care of packages, and then I just manually run ujust update every now and then to update os?
can layer again after update
Yes, you can also just click on system update
it also updates your distroboxes etc
if you have them
nah, don't need the driver anymore anyway. as I said, it's connected to the existing printserver
lol i always prefer network printing
less problems
I was just setting it up. easier to get things calibrated when it's not two flights of stairs away
usb printing is the worst time of your life
always
interesting, I've always had way more problems relying on the printer's terrible network stacks
it's why I have a printserver
so I can have network print... sorta
š
anyway brb gotta restart for the layer to be removed
back, it's fetching the ostree chunks
yep, and then you gotta reboot again
normally you don't have to reboot first
for the next time š
rpm-ostree reset and then rpm-ostree upgrade should be fine for just os
lets hope the kernel upgrade will fix your issue, otherwise I'm at a loss here
Not many people have anything other than 192.168.1/24 and a gateway to the internet
Myself included
back annnnd pings are failing
pings same subnet and 1.1.1.1 are fine?
still?
yep
192.168.4.1 is still seeing the pings and is sending ICMP echo replies
but 192.168.1.244 isn't seeing the replies
so RX is f'd
for some reason
yeah, always has been
but still passing through router?
did I not have that in the og post?
and switch?
one sec, lemme get tcpdump up on the router too
if it still passes back there its definitely the NIC dropping the packet for some reason
wait there's an easier way
just do
watch /proc/net/dev
yep I'm seeing the replies
on the router
what am I looking for?
I see bytes and packets incrementing, of course
watch "ethtool -S enp6s0"
might be easier
so you can actually see the meaning of the valuesnow ping
I am
I never stopped it from back where I said "yep I'm seeing the replies on the router"
netstat -i
what does that say?
sorry i not llmao I was about to say
"a lot"
so there is some drop but not much :huh:
yeah, it should have hundreds, this ping has been running for a looong time
this is just frames not packets btw
351 packets sent, in fact
ah, ok
this is datalink layer
wait
WAIT
oh wait no nevermind, I mixed my shit up and got excited for no reason
I'm used to working in layer 3+, for a second I thought that was mDNS request and was like "wait, why is the dns server not handling this!"
but that's ARP, not mDNS lol
Hahaha
the dns server better not be involved in that lmao
sudo dmesg | grep -i enp6s0
I just saw 'request who-has docker.lan tell' and I was like "WAT"
maybe this says something?
weird that it's getting renamed and flaps twice before stabilizing 26s into bootup
but also probably not related?
that is weird
that's actually weird
that shouldn't happen
lemme see what's arou
realtek btw :huh:
can you do
journalctl -b 0 | fpaste
that's a lot of I/O and buffer errors
I don't support sr0 is networking related is it?
ah, nah, sr0 is the cdrom
everything from r8169
watch -n 1 ifstat
then do more (if possible fast) pings
well you can see for yourself if you get drops on RX
pretty visible therethat's -f on linux, correct?
-f?
oh you mean ping. sorry i haven't slept yet
well I was firing off about 4500 pings/sec....
no errors or drops
lmfao
what did you do
ping -i 0.000000001
xD
ping -f 192.168.4.1
-i is also a thing for ping interval
when it says '-f is for flood' it aparently damn well means it
easy LAN DoS attack
can't even ping that fast without sudo
hahaha
wtf that is new to me
it doesn't let you if you don't sudo
correct
not the worst packet cannon I've unleashed
I wrote an snmp poller for work one time that could do 1gb/s
I wanted to find out if it could go higher, but nobody was willing to install it on something with a larger NIC
š
my router seems to ignore ping flood
it still just replies every like 100 pings
but nothing more
icmp_seq just jumps by 100 on every reply lmfao
honestly that makes more sense than accepting them
I'm at a loss here now
Just realtek driver weirdness caused by unusual home setup idk
wait one last thing
journalctl -b 0 | fpaste
that could maybe still show something
that's just your journald
boot log
sure. gimmie a sec to make sure there's nothing that needs sanitized
probably not, but probably good practice anyway lol
IT JUST KEEPS GOING jfc
.... why is this 16000 lines
is that what you're intending to have uploaded?
https://paste.centos.org/view/8e84ae2f
Do you have any manual network configuration done on this install of bazzite?
besides the most basic stuff such as static ip
like something that wasn't done on the livecd for example
nope, never touched any of it
IP address is from DHCP on the router at 192.168.1.1
about the only thing even close is that the router has a static mapping for the mac/ip combo so it doesn't move around
I do know why your journald has so much noise
Discord's general existence?

Yes
yeah it did that on ubuntu too, but worse
im glad i just use browser discord
apparmor lmao
I'm slowly wearing my friend groups down on getting off discord and onto signal or matrix
I don't know what else we could do to try and debug the issue
one little thing you could try
systemctl stop firewalld
and do ping again
maybe its interfering somehow
don't know how it could but its so simple that it might just be worth a try at least
just do systemctl start firewalld
again if it changes nothingnope
can you do
mtr 192.168.4.1
shows detailed traceroute
it will also show if there's any loopit's showing nothing at all
:huh:

that is just weird

ping randomly started working again
annnnd it's stopped
it could just be the router deprioritizing pings
maybe
now this is interesting
also
there should be at least 2 hops
ping isn't seeing any responses
why is there only 1
but the loss % is still going down
ohi tsb ecause of the redirect
obviously
im dumb

however, ping is showing nothing
tcp dump is showing replies
can you do 1.1.1.1 (don't show me it will show your public ip)
the fuck is going on
I would ask if this is a bug in ping, but I still can't reach stuff on 192.168.4.1
try real traffic
not this icmp bs
try generating some tcp traffic
nada response
so the ping responses that I was seeing were from mtr
so apparently if you have an active connection going, it stays connected even when it drops otherwise
also ping 1.1.1.1 is doing fine, averaging about 14ms
not a single packet lost?
none
yeah im at a loss here, it seems to be something really specific to your setup
but still somehow only happening in bazzite?
it makes no sense
Could be something in the network stack perhaps that ubuntu handles, but fedora doesn't?
or the driver
there are some rules in /etc by default
that fedora uses
otherwise we don't touch that stuff as far as I know
maybe I'll use this as an opporuntity to recreate my router config anyway and ditch the IOT vlan, I don't actually need it anymore since I got rid of the last of my cloud IOT devices a few months ago
if nothing else that'll simplify whatever the fuck is going on xD
(no traffic should be getting tagged for iot for this, for the record)
just another crazy thought, what happens if you put the other device you're pinging on the same subnet temporarily?
will it ping and tcp just fine?
it could still very well be a weird realtek kernel driver bug
because your setup is super specific and somewhat complex
not sure, haven't tried that yet, mostly because 192.168.4.1 is specifically a portainer server, so I'm not just moving one thing, I would be moving EVERYTHING on it
and then of course changing all the DNS
then just move the bazzite machine to 4.x xD
i guess
prob easier then
fair enough
brb
wait
bad idea
oh ok
oh wait nvm I can deal with that
I was like 'router is 192.168.1.1, if it happens in reverse than I can't get back into the router to undo it'
but I have a laptop I could get in on lol
:clueless:

You have no idea what ive managed to screw up
I actually am trying to get a terminal server for the rack xD
adding allowed vlans on port-channel
but forgetting add
oops
runs to switch
testing testing?
because that destroys the management vlan
ok well I still have internet
and can no longer access switch
oh right
can still ping router
and I can't ping 192.168.4.1
What
the fuck
currently 192.168.4.244
you should be able to always ping router right, don't you use multiple interfaces
on one hand, yes
on the other hand, we have not gotten to this point because things are behaving sanely.
you just ping it on a different ip
192.168.4.1 would normally be the ip you'd put the interface on that side of the router
well some people like 254
ew
if you can ping router 192.168.1.1 on 192.168.4.244 what kind of cursed setup do you have?
if /24
i mean i guess
/20
š
192.168.0.0/20 and 192.168.16.0/20 are the subnets
iirc
but then you also do more localized subnetting on /24?
yeah
i like to avoid such setups
hahahaha
It's mostly to keep the vlans seperated
192.168.0.0/20 vlan 1, 192.168.16.0/20 is vlan 16
slightly weird subnetting is preferably to ending up in vlan hell
imo
but you want the bazzite machine to only be able to access 192.168.1.x so you do /24
though like I said, vlan is going away anyway
should be fine if router is 192.168.1.1 then
but this is probably why you're getting icmp redirects
possibly, though that shouldn't cause ~95% packet loss
especially with that 5% being as streaky as it is
because 1) your bazzite machine sends the packet to the gateway because it's beyond the subnet 2) but the ip is in the router's subnet so it sends it back over the same interface
that's afaik not a very good thing to do
i kind of want to
ip route add 192.168.0.0/20 dev enp6s0 proto kernel scope link src 192.168.1.244 metric 100
and then delete the existing route
just to see what happenswhy not just change the interface?
to /20
just do it in KDE/GNOME settings
that should update it automatically
I don't actually see a place to do that?
I looked, could just be missing it
oh wait
you're using DHCP right?
yeah
just change the DHCP to do /20 subnet mask
that's by far the easiest way
255.255.240.0
otherwise use static ip+subnet mask of ^
if you do want to do it manually (and thus use static ip) assuming you use KDE you just go to Wi-Fi & Networking in KDE settings and then select the interface and then IPv4 and then switch from Automatic to Manual
still no ping
what if you initiate the pings from the other side?

that's always worked fine
yeah see that updates automatically based on your interface settings
looks fine
yep, I did it through the interface settings
I just figured that was easier to read than ascreencap of the gui
but yeah here's some wild shit

SIMULTANEOUSLY

is this server also on /20?
it's got static addressing, no dhcp
yes, but still has an ip and subnet mask
and gateway
one second, trying to find a way to actually check, it doesn't have ifconfig or anything else I can think of on it xD
just ip
ip
type that
ip address
192.168.4.1/32 :huh:
so its on its own router interface?
now i know what's going wrong
the portainer server is running in a container on the proxmox server
the problem here is that the bazzite machine will treat it as a packet that doesn't need routing
yet it needs routing because 192.168.4.1/32
makes sense you can't ping
except ocassionally it can
at semi-random, granted
im assuming because of the icmp redirect
well, lets try changing that to 192.168.4.1/20, I presume?
yes, but that does mean the entire subnet has access to it
without any routing
if you meant for that to have its own subnet
it needs its own router interface
or at least virtually
no, it's just for delineation
192.168.4.0/24 is just 'services' to put them in an easily understandable block
yeah i understand
that way it's not just 'shit was that 192.168.1.224 or 192.168.1.242?'
xD
OK, so preliminary results are looking good
it can work the way you do, but i feel like its really unnecessarily complicating things. When I had internship at a hospital it was all just 10.x.x./24 subnets with separated vlans that way
Piped is loading, the wiki is loading, portainer is loading, obviously the nginx reverse proxy is loading
because its just simpler to work that way
the way you're doing things sounds is some cursed datacenter rerouting stuff
outside of my current knowledge too to be honest
haha. in my defense, I'm a programmer, i'm not a network eng, my JNCIA was like 10 years ago.
and the CCNA was even farther.
you know what was happening
and why rx was failign?
also the first time I had used vlans outside of juniper, which has a much saner implementation than edgeOS
nope. Also curious why this worked for like, 5 years with no problems for everything but bazzite
if you have /32 you need to have a very specific route configured to the server
or it will be dropped
192.168.4.1/32
means that ip and nothing else
Sorry let me correct myself here: 192.168.4.1/32 would mean that it would have its own router interface and would only be an endpoint
and you can't use that ip for any other subnets, but because you're using /20 that ip is included in the subnet
Does it work now?
it seems to work now
Avoid using IP addresses already part of a subnet
I'm still fascinated why it worked before
as endpoint address
because of your router being smart
100%
the redirects saved you
You can do what you want to do, but don't use anything inside those /20 subnets
you can even do 10.0.0.1 if you want as long as its not part of another subnet
weird thing is I'm pretty sure I never saw them before. I'm pretty sure I was setting that up with a fair bit of trial-and-error (as you can proooobably guess lmao)
icmp redirects would have definitely be a red flag
maybe ubuntu's ping just doesn't show them?
just make sure you got a route made in the router (or use default routing, it usually suffices for small networks)
anyway, thank you a ton for the help
You definitely went above and beyond lmao
5 and a half hours lmao
no problem, i love networking š
it's my job after all
still a junior
but i got my CCNA
and am learning for CCNP
I have not used any of those skills in a very long time lmao
I don't really config routers or switches, I basically make/run all the tools/monitoring/systems/etc the Actual Network Engineers use xD
it's part of the reason for trying to do such a complicated, (semi) overbuilt setup- to at least keep some kind of foot in
aparently the foot got bitten off and I've been on a peg leg this whole time š
(also even funnier, the fucking dual-homed DNS server is set up correctly lmao)

Hahaha. By the way, the best way to have a single server with only 1 interface (so 1 IP) connected to a router is by using /30 meaning you have 2 usable IPs: one for the router interface and one for the server
/30 is also called point-to-point connection
because you got 2 usable IP's
ok, 'correctly'
š
in WAN land they use /31 nowadays to preserve IP's but for home stuff avoid that shit
because not everything supports it
with this i mean that it is directly connected to the router and thus just uses your router's "implicit routing" so not needing a static route
because there's only 1 device attached to the port
implicit routing is just eth0: 192.168.1.1/24 eth1: 192.168.2.1/24
if a packet from eth0 has destination IP 192.168.2.244 it just sends it to eth1 because its part of that subnet, no routes needed
Anyway, I'll redo my network (and hopefully unfuck it lol)
Thanks a ton for the help!
(lmao 570+ messages)
No problem!
And have a nice day š
you too!