whelp

Nov. 1st, 2017 05:06 pm
solarbird: (molly-thats-not-good-green)
[personal profile] solarbird
definitely a kill everyone day

isc-dhcp-server isn't relaying packets properly after the power outage. why? WHO THE FUCK KNOWS, IT'S LINUX. (current Debian, to be specific.)

* Packets to door (the server) from DHCP machines get there 100% of the time. (Also true for packets from the fixed-IP LAN.)
* Packets originating on door get to all destinations 100% of the time.
* Packets to the fixed-IP LAN from DHCP machines (going though door) get there about 50% of the time.
* Packets to the modem itself (which has a fixed IP address on the same subnet as the fixed-IP LAN) from the same sources get through 10%-20% of the time. This is also the same card as the fixed-IP LAN. Why one address on the fixed-IP LAN is getting special treatment? FUCKIN' GOT ME. I presume it has something to do with being the gateway. But if we take out the exceptions in the router table for the servers, and send everything through said gateway, nothing changes. Packet failure at about the same rate across the same classes.
* No DHCP client packets get past the points described. (Zero packets to internet, even ones that reach the internet modem.)
* All servers (fixed-IP LAN) talk to the internet just fine.

WHAT. THE. FUCK.

eta: It gets stupider. door drops half of DHCP-sourced packets to newmoon (fixed-IP lan), but drops none to lodestone (web server). Didn't think to play with that there is zero reason for them to be different. And yet.

eta2: it's not the NICs. or the switches. don't ask. yes, three hardware iterations later, I'm sure. It's not hardware.

eta3: apt-get remove and apt-get install of isc-dhcp-server changed NOTHING. what. the. fuck.

Date: 2017-11-02 01:35 am (UTC)
rmd: (Default)
From: [personal profile] rmd
If it's a modern Debian, I'll just reflexively blame systemd.

Date: 2017-11-02 03:27 am (UTC)
From: [identity profile] rjl20.wordpress.com
Are the machines on the dhcp-client lan getting assigned IP addresses? If so, then maybe I'm confused about what role isc-dhcp-server is playing, because I don't think of that as something that routes traffic itself, except insofar as it's sitting on a box on your network which also, through some other mechanism such as iptables, routes traffic.

Date: 2017-11-02 03:41 am (UTC)
From: [identity profile] rjl20.wordpress.com
Does current debian use iproute2? I'm out of practice. I think it does, though. What's the output of the following commands on door?

ip route show
ip rule show

Date: 2017-11-02 03:54 am (UTC)
From: [identity profile] rjl20.wordpress.com
I ask because I've seen similar-ish behavior in a network where the router had some policy-based routing going on, and that was a bugger to troubleshoot. Although I don't know why a power outage would have affected it for you; in my case it was definitely an initial misconfiguration problem, and started off broken.

It sounds to me like the routing engine is having trouble determining which NIC to send which traffic to, sometimes, and that wouldn't be something isc-dhcp-server was responsible for. But it could be something policy-based routing was messing up.

Oh, also worth double-checking that the subnet mask on the dhcp-assigned machines is sensible. I can't think of a reason a bad subnet mask would cause intermittent routing problems, but I've seen it cause other mysterious-seeming problems, so it's worth verifying.

Date: 2017-11-02 04:42 am (UTC)
From: [identity profile] rjl20.wordpress.com
Huh.

You don't have a full /24 reservation from Comcast, do you? I don't think that could have anything to do with this problem, but that looks wrong to me, unless they gave you all of 173.160.243.0/24, in which case them putting your gateway on .46 is a weird choice.

So, your servers on static addresses are somewhere in 173.160.243.42-45, I'm guessing, with door being .41 and the modem being .46. They're all plugged into a switch connected to eth2? And then eth1 has dynamically assigned machines behind them, with isc-dhcp-server listening on that interface.

There must be something on door doing NAT for the 192.168.1.x machines, right? Is that iptables?

As for the difference between lodestone and newmoon, do their routing tables look the same?

Date: 2017-11-02 04:51 am (UTC)
From: [identity profile] rjl20.wordpress.com
What's the third card, eth0, doing?

Date: 2017-11-02 05:54 am (UTC)
From: [identity profile] rjl20.wordpress.com
Man, I'm at a bit of a loss. Had the configuration which had been working until now survived a reboot of door since it started working? All I can think of to explain the power outage causing a network failure would be a configuration implemented from the command line on door and not made persistent in a config file somewhere.

If the fixed-IP machines aren't using door as a gateway, then this has got to be a problem with NAT somehow. What do you get for:

iptables -t nat -L
iptables -L
cat /proc/sys/net/ipv4/ip_forward

on door?

Date: 2017-11-02 05:59 am (UTC)
From: [identity profile] rjl20.wordpress.com
Oh, I see you've taken it down and brought it up before, and it survived. So... huh. I still think it's got to be NAT related, but I've got no idea how.

Date: 2017-11-02 06:04 am (UTC)
From: [identity profile] rjl20.wordpress.com
To confirm it isn't somehow an issue with dhcp itself, you could statically assign one of the dhcp-net machines an address above 192.168.1.210, if you haven't tried that yet.

Date: 2017-11-02 06:32 am (UTC)
From: [identity profile] rjl20.wordpress.com
And I know you've repeatedly said it's not hardware, and that you've tried cutting out the switch on the dhcp-lan side with no change. But just to double check, have you tried cutting out the switch on the fixed-ip side? Plug the modem (or newmoon) straight into door?

I only ask because the most recent time I had a baffling packet loss situation on my home network with perfectly fine connections between some pairs of machines and 20-90% packet loss between others, it was a consumer switch that was the problem. I'd never seen a switch fail like that, but picking up a $30 replacement solved my issue.

Date: 2017-11-02 06:48 am (UTC)
From: [identity profile] rjl20.wordpress.com
Yeah, that's what's throwing me, too. It should work or not. This partial failure, I'm stumped.

Date: 2017-11-02 06:38 pm (UTC)
From: [identity profile] rjl20.wordpress.com
Wow, what the hell? I mean, I'm glad you've got it resolved, but now I'm worried that things I've set up are going to break in unusual and intermittent ways.

January 2026

S M T W T F S
    1 23
4 56 7 8 910
1112 131415 1617
1819202122 2324
25262728293031

Most Popular Tags