definitely a kill everyone day
isc-dhcp-server isn't relaying packets properly after the power outage. why? WHO THE FUCK KNOWS, IT'S LINUX. (current Debian, to be specific.)
* Packets to door (the server) from DHCP machines get there 100% of the time. (Also true for packets from the fixed-IP LAN.)
* Packets originating on door get to all destinations 100% of the time.
* Packets to the fixed-IP LAN from DHCP machines (going though door) get there about 50% of the time.
* Packets to the modem itself (which has a fixed IP address on the same subnet as the fixed-IP LAN) from the same sources get through 10%-20% of the time. This is also the same card as the fixed-IP LAN. Why one address on the fixed-IP LAN is getting special treatment? FUCKIN' GOT ME. I presume it has something to do with being the gateway. But if we take out the exceptions in the router table for the servers, and send everything through said gateway, nothing changes. Packet failure at about the same rate across the same classes.
* No DHCP client packets get past the points described. (Zero packets to internet, even ones that reach the internet modem.)
* All servers (fixed-IP LAN) talk to the internet just fine.
WHAT. THE. FUCK.
eta: It gets stupider. door drops half of DHCP-sourced packets to newmoon (fixed-IP lan), but drops none to lodestone (web server). Didn't think to play with that there is zero reason for them to be different. And yet.
eta2: it's not the NICs. or the switches. don't ask. yes, three hardware iterations later, I'm sure. It's not hardware.
eta3: apt-get remove and apt-get install of isc-dhcp-server changed NOTHING. what. the. fuck.
isc-dhcp-server isn't relaying packets properly after the power outage. why? WHO THE FUCK KNOWS, IT'S LINUX. (current Debian, to be specific.)
* Packets to door (the server) from DHCP machines get there 100% of the time. (Also true for packets from the fixed-IP LAN.)
* Packets originating on door get to all destinations 100% of the time.
* Packets to the fixed-IP LAN from DHCP machines (going though door) get there about 50% of the time.
* Packets to the modem itself (which has a fixed IP address on the same subnet as the fixed-IP LAN) from the same sources get through 10%-20% of the time. This is also the same card as the fixed-IP LAN. Why one address on the fixed-IP LAN is getting special treatment? FUCKIN' GOT ME. I presume it has something to do with being the gateway. But if we take out the exceptions in the router table for the servers, and send everything through said gateway, nothing changes. Packet failure at about the same rate across the same classes.
* No DHCP client packets get past the points described. (Zero packets to internet, even ones that reach the internet modem.)
* All servers (fixed-IP LAN) talk to the internet just fine.
WHAT. THE. FUCK.
eta: It gets stupider. door drops half of DHCP-sourced packets to newmoon (fixed-IP lan), but drops none to lodestone (web server). Didn't think to play with that there is zero reason for them to be different. And yet.
eta2: it's not the NICs. or the switches. don't ask. yes, three hardware iterations later, I'm sure. It's not hardware.
eta3: apt-get remove and apt-get install of isc-dhcp-server changed NOTHING. what. the. fuck.
no subject
Date: 2017-11-02 01:35 am (UTC)no subject
Date: 2017-11-02 02:59 am (UTC)no subject
Date: 2017-11-02 03:27 am (UTC)no subject
Date: 2017-11-02 03:41 am (UTC)ip route show
ip rule show
no subject
Date: 2017-11-02 03:54 am (UTC)It sounds to me like the routing engine is having trouble determining which NIC to send which traffic to, sometimes, and that wouldn't be something isc-dhcp-server was responsible for. But it could be something policy-based routing was messing up.
Oh, also worth double-checking that the subnet mask on the dhcp-assigned machines is sensible. I can't think of a reason a bad subnet mask would cause intermittent routing problems, but I've seen it cause other mysterious-seeming problems, so it's worth verifying.
no subject
Date: 2017-11-02 04:00 am (UTC)subnet 192.168.1.0 netmask 255.255.255.0 { range 192.168.1.2 192.168.1.210; option routers 192.168.1.1; } door# ip route show default via 173.160.243.46 dev eth2 173.160.243.0/24 dev eth2 proto kernel scope link src 173.160.243.41 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.1 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1 door# ip rule show 0: from all lookup local 32766: from all lookup main 32767: from all lookup default door# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default murkworks.net 0.0.0.0 UG 0 0 0 eth2 173.160.243.0 * 255.255.255.0 U 0 0 0 eth2 192.168.0.0 * 255.255.255.0 U 0 0 0 eth0 192.168.1.0 * 255.255.255.0 U 0 0 0 eth1no subject
Date: 2017-11-02 04:42 am (UTC)You don't have a full /24 reservation from Comcast, do you? I don't think that could have anything to do with this problem, but that looks wrong to me, unless they gave you all of 173.160.243.0/24, in which case them putting your gateway on .46 is a weird choice.
So, your servers on static addresses are somewhere in 173.160.243.42-45, I'm guessing, with door being .41 and the modem being .46. They're all plugged into a switch connected to eth2? And then eth1 has dynamically assigned machines behind them, with isc-dhcp-server listening on that interface.
There must be something on door doing NAT for the 192.168.1.x machines, right? Is that iptables?
As for the difference between lodestone and newmoon, do their routing tables look the same?
no subject
Date: 2017-11-02 05:27 am (UTC)door is only gatewaying for the DHCP side, due to Idiotic Comcast Reasons which makes me want to Murder. The fundamental topology is:
modem -> netgear switching hub
netgear switching hub -> all fixed-IP servers including door
door (DHCP server) -> DHCP devices switching hub
DHCP devices switching hub -> wifi bridge, various other clients.
So again, door does no routing for fixed-IP devices.
eth0 on door is the motherboard NIC and was the uplink to the fixed-IP hub, but as part of the Is This Hardware? game, it's been moved around. I could move it back.
eth1 is currently running the DHCP server (isc-dhcp-server)
eth2 is currently the uplink to the fixed-IP switching hub which is connected to the Comcast modem.
(There are three NICs in door because I'd wanted it to do the fixed-IP subnet routing, but again, Idiotic Comcast Reasons. One has just been sitting in there unused for a while. It has an IP address which is not in any of the other subnets and is not routed to.)
no subject
Date: 2017-11-02 05:54 am (UTC)If the fixed-IP machines aren't using door as a gateway, then this has got to be a problem with NAT somehow. What do you get for:
iptables -t nat -L
iptables -L
cat /proc/sys/net/ipv4/ip_forward
on door?
no subject
Date: 2017-11-02 05:59 am (UTC)no subject
Date: 2017-11-02 06:04 am (UTC)no subject
Date: 2017-11-02 06:30 am (UTC)It does occur to me that I haven't specifically reset masquerading. So I'll do that tomorrow. Maybe it fell out of iptables somehow.
no subject
Date: 2017-11-02 06:32 am (UTC)I only ask because the most recent time I had a baffling packet loss situation on my home network with perfectly fine connections between some pairs of machines and 20-90% packet loss between others, it was a consumer switch that was the problem. I'd never seen a switch fail like that, but picking up a $30 replacement solved my issue.
no subject
Date: 2017-11-02 06:44 am (UTC)The thing I can't figure out is, if it's dropped masquerade, how is it ever working at all?
no subject
Date: 2017-11-02 06:48 am (UTC)no subject
Date: 2017-11-02 06:33 pm (UTC)HOW DOES THAT EVEN HAPPEN FUCK IDK
but I flushed all the things and rebuilt it from scratch and it's going like a champ again.
I have a theory that requires ip tables and our switches to be smart in kind of stupid ways and a couple of other things just to be stupid and the one that just has to be stupid is the comcast modem so I think that's a safe bet.
Thanks for chatting with me and stuff, it helped this fall out of my brain. I appreciate it.
no subject
Date: 2017-11-02 06:38 pm (UTC)no subject
Date: 2017-11-02 06:42 pm (UTC)My long-term fix is an if-up.d script that just throws everything the fuck out and does it over again in ways that shouldn't be necessary but fuck that we're doin' this thing. :D
no subject
Date: 2017-11-02 07:32 pm (UTC)no subject
Date: 2017-11-02 05:32 am (UTC)newmoon is .42 (mail and related server)
lodestone is .43 (web and related server)
all of this is discoverable through DNS so.
.44 was supposed to be the third card in door, to gateway to a fixed-IP subnet, but Fuck You Comcast. .45 is currently my PS4 Pro because OVERWATCH IS LIFE.
no subject
Date: 2017-11-02 05:34 am (UTC)no subject
Date: 2017-11-02 05:30 am (UTC)no subject
Date: 2017-11-02 03:52 am (UTC)door has two cards. one connecting to the modem, one serving dhcp. DHCP devices live off the second card. door bridges between them.
I'm pretty sure it has to be something down in route. But route looks right. And nothing I do makes it work any better/differently, so.
no subject
Date: 2017-11-02 04:51 am (UTC)