Nov. 1st, 2017 05:06 pm
solarbird: (molly-thats-not-good-green)
definitely a kill everyone day

isc-dhcp-server isn't relaying packets properly after the power outage. why? WHO THE FUCK KNOWS, IT'S LINUX. (current Debian, to be specific.)

* Packets to door (the server) from DHCP machines get there 100% of the time. (Also true for packets from the fixed-IP LAN.)
* Packets originating on door get to all destinations 100% of the time.
* Packets to the fixed-IP LAN from DHCP machines (going though door) get there about 50% of the time.
* Packets to the modem itself (which has a fixed IP address on the same subnet as the fixed-IP LAN) from the same sources get through 10%-20% of the time. This is also the same card as the fixed-IP LAN. Why one address on the fixed-IP LAN is getting special treatment? FUCKIN' GOT ME. I presume it has something to do with being the gateway. But if we take out the exceptions in the router table for the servers, and send everything through said gateway, nothing changes. Packet failure at about the same rate across the same classes.
* No DHCP client packets get past the points described. (Zero packets to internet, even ones that reach the internet modem.)
* All servers (fixed-IP LAN) talk to the internet just fine.


eta: It gets stupider. door drops half of DHCP-sourced packets to newmoon (fixed-IP lan), but drops none to lodestone (web server). Didn't think to play with that there is zero reason for them to be different. And yet.

eta2: it's not the NICs. or the switches. don't ask. yes, three hardware iterations later, I'm sure. It's not hardware.

eta3: apt-get remove and apt-get install of isc-dhcp-server changed NOTHING. what. the. fuck.
solarbird: (molly-computer-all-lit-up)
That way you're replacing the server bungees and suddenly realise that most people will never say they've upgraded the bungees on their servers, like, ever.
solarbird: (molly-thats-not-good-green)
The 1995 P166 that has been until now door.murkworks.net has formally and abruptly retired itself. So I'm having to move the new box into place now. This is the DMZ box I was talking about earlier.

Henceforth, "Door" refers to "New Door," not the old machine that is broken. It is latest Debian.

Door has three network cards: eth0 going to cable modem, eth1 going to fixed IP LAN segment, eth2 going to DHCP LAN segment. Door is running both DNS and DHCP servers.

Door can see everything in the world, on all cards. Complete functionality.

DHCP side can see everything in the world, on all cards. Complete functionality.

Fixed IP machines can all see Door (including its DNS services), and each other, and talk to the DHCP side, but can talk to nothing living out on eth0.

tcpdump on Door shows Door handing off ICMP packets on eth0, so that direction seems okay.

I am not seeing ACKs coming back to Door on eth0 from google.com but I can't be sure they aren't doing something tricky and my filters are confused.

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface         UG    0      0        0 eth0 U     0      0        0 eth1 UH    0      0        0 eth1 UH    0      0        0 eth1 UH    0      0        0 eth1 UH    0      0        0 eth0   U     0      0        0 eth2

Door is (on eth0), (on eth1), (on eth2). is the modem. is a network to eth1.

Anybody know wtf?

eta: The router - in addition to not showing door any ACKs for anything from .42 and .43 - is sending out a lot of ARP packets looking for and, and I'm starting to think it won't talk to a gateway box in the fixed-IP range. I try to add .41 as a gateway address for .42 and .43 and it refuses, saying illegal LAN address. SUPER RAGIFICATION ENGAGED.

eta2: And the new problem is that the PS4 won't pick up the gateway information from the Linux-based DHCP server. It will pick up an address! It's also not getting the DNS server number either. Why? Fuck if I know, everything else does it right.
solarbird: (molly-thats-not-good-green)
Okay this is intensely stupid because debian

network card assignment is being random, because debian

i'm blacklisting the drivers and trying to load them in /etc/modules in correct order so the device assignment is consistent

i blacklist e1000e and r8169 (so they don't autoload)

and then have them both in /etc/modules so they do load in a specific order, r8169 then e1000e

e1000e loads

r8169 doesn't

modprobe r8169 loads fine

what. the. fuck. do. i. do. here.
solarbird: (Default)
Anyone interested a Tektronix 4317? It's about 25 years old, I don't have a keyboard, but last time it was turned on (about 20 years ago?) it booted to a login prompt. I think it's a 68020 in there and I think it was running some flavour of BSD. No promises. Also, no password.


solarbird: (fascist sons o bitches)
Well, this is fun. Got anything - I mean anything, not just computers - that has public- or semi-public-facing USB ports? Not for long.

solarbird: (molly-computer-all-lit-up)
I will say this for Computers Exploding, which in this case was the digital audio workstation dying on me.

Sure, it'll drive me mad, but it will set me off on a FIX ALL THE THINGS tear, which is why journalpress and social have enough bugs fixed that they should work through PHP7(!), why I suddenly had 60-odd lost-in-the-queue comments reappear overnight, why I don't get timeouts on the LAN talking to the Wordpress administrative pages anymore, why in future we can talk to our webserver over the LAN even when Comcast dies on us, and why I won't have to tinker with newmoon or lodestone after reboots now, since greylisting and varnish both finally come up correctly the first time.

Fix. All. The. Things.
solarbird: (molly-thats-not-good-green)
eta: A solution? timeout_req defaults to 2 seconds and is how long varnish waits for request header data from the client. Setting that to 10 seconds makes the problem go away. (I just picked a big number - most of the timeouts below ranged between >2sec and <6sec.)

This could sorta make sense except the problem didn't repro from off the LAN, an the LAN is much faster than our WAN uplink. Which implies that there is something our LAN is doing at random which radically slows down packet travel from client machines to server machines with fixed IP addresses.

This is so weird.

Here's pinging the webserver from this laptop:

round-trip min/avg/max/stddev = 1.543/4.381/7.130/1.321 ms

Here's pinging this laptop from the webserver:

rtt min/avg/max/mdev = 2.252/53.628/78.781/24.769 ms

Is this relevant? Possibly? I don't know. But I do know that the 'fast route' also goes from the network address translation section of the LAN over to the fixed-IP section. That's got to be part of it.


Okay, so, when I get these dropped server connections - which seem to happen only on the LAN, not in the outside world - I see a bunch of these blocks in varnish's log. But I don't know what these are, other than oh hi timeoutes.

From the client standpoint, there's no waiting; you click on the link and instantly get a "server has dropped connection" message. Boom. Much faster than actually loading a page.

Anybody know how to read varnish logs?

* << Session >> 65693
- Begin sess 0 HTTP/1
- SessOpen 53818 :80 80 1469083778.683280 14
- Link req 65694 rxreq
- Link req 65696 rxreq
- Link req 65698 rxreq
- Link req 65700 rxreq
- Link req 65702 rxreq
- Link req 32788 rxreq
- Link req 360472 rxreq
- SessClose RX_TIMEOUT 6.262
- End

* << Session >> 146
- Begin sess 0 HTTP/1
- SessOpen 53820 :80 80 1469083780.279525 16
- Link req 147 rxreq
- Link req 149 rxreq
- Link req 151 rxreq
- Link req 153 rxreq
- Link req 229406 rxreq
- Link req 327718 rxreq
- SessClose RX_TIMEOUT 4.675
- End

* << Session >> 294941
- Begin sess 0 HTTP/1
- SessOpen 53819 :80 80 1469083780.278725 15
- Link req 294942 rxreq
- Link req 294944 rxreq
- Link req 294946 rxreq
- Link req 196635 rxreq
- SessClose RX_TIMEOUT 4.685
- End

* << Session >> 360467
- Begin sess 0 HTTP/1
- SessOpen 53824 :80 80 1469083780.331048 20
- Link req 360468 rxreq
- Link req 65704 rxreq
- SessClose RX_TIMEOUT 4.641
- End

* << Session >> 98345
- Begin sess 0 HTTP/1
- SessOpen 53821 :80 80 1469083780.292602 18
- Link req 98346 rxreq
- Link req 32782 rxreq
- Link req 360470 rxreq
- SessClose RX_TIMEOUT 4.689
- End

* << Session >> 327711
- Begin sess 0 HTTP/1
- SessOpen 53823 :80 80 1469083780.331335 22
- Link req 327716 rxreq
- SessClose RX_TIMEOUT 4.660
- End

using varnishlog directly, it looks more like this:

 328293 SessClose      c RX_TIMEOUT 10.560
    328293 End            c
    196743 SessClose      c RX_TIMEOUT 5.183
    196743 End            c
    163890 SessClose      c RX_TIMEOUT 5.202
    163890 End            c
    360556 SessClose      c RX_TIMEOUT 3.808
    360556 End            c
    393291 SessClose      c RX_TIMEOUT 3.823
    393291 End            c
    131082 SessClose      c RX_TIMEOUT 3.837
    131082 End            c
         0 ExpKill        - EXP_Expired x=32809 t=-10
         0 ExpKill        - EXP_Expired x=393245 t=-10
         0 CLI            - Rd ping
         0 CLI            - Wr 200 19 PONG 1469085933 1.0
         0 ExpKill        - EXP_Expired x=393247 t=-10
    196758 SessClose      c RX_TIMEOUT 5.413
    196758 End            c
    229455 SessClose      c RX_TIMEOUT 5.409
    229455 End            c
    131083 SessClose      c RX_TIMEOUT 5.412
    131083 End            c
    163908 SessClose      c RX_TIMEOUT 5.413
    163908 End            c
    360559 SessClose      c RX_TIMEOUT 5.409
    360559 End            c
    262286 SessClose      c RX_TIMEOUT 5.525
    262286 End            c


solarbird: (Default)

How the FUCK is this shipping code? Seriously. Every attempt to deal with Ubuntu this week - starting with its mysterious inability to see PARTS OF my audio interface, leading to a massive upgrade fail to 14.04 (that I was able to fix), and then to a completely destructive failed upgrade to 16.04, has led to more and more monumental failures, and now we’re at the point where a freshly-made 16.04 LIVE DVD can’t even fucking see FILESYSTEMS consistently.

(That’s /home. It’s an ext4 filesystem. If I let installer get it wrong and try to mount it ext2, it fails and hangs the installer. If it tell it it’s ext4, it insists upon trying to format it and destroy all my data. FUCK YOU UBUNTU.)
solarbird: (molly-angry-crying)
I'm sitting here, stuck. I have no studio, I have things I'm SUPPOSED TO BE WORKING ON THIS WEEK, I have NO trust in Ubuntu, I have NO idea what I should be doing next - except that's not true, the next step is "SURE HOPE MY BACKUP IS GOOD" and go ahead with formatting.

But EVERY. SINGLE. STEP. I've let Ubuntu take has been massively destructive. EVERY ONE. Not ONE thing has worked right. I have no idea what it will do next and no confidence that it, fuck, I dunno, won't wipe the Archives partition or the Windows 8.1 partition that HEY GUESS WHAT I CAN'T RECREATE SINCE MICROSOFT WON'T LET ME, or WHO THE FUCK KNOWS WHAT.

This is grotesque.
solarbird: (molly-kill-everyone-with-sticks)
ubuntu 16.04 installer.

sees the /home partition.

thinks it's ext2.

it's not, but I go eh, it can mount it from the live view, it'll figure it out.

can't mount it as ext2, won't work. INSTALLER HANGS BECAUSE IT'S CRAP.


installer still thinks it's ext2. tell it no, it's ext4.


i hate you SO FUCKING MUCH ubuntu
solarbird: (molly-kill-everyone-with-sticks)

'k now get this

ubuntu 16.04 live disc has a repair/reintall option, it'll replace the OS and reset all OS settings but try to keep installed apps and definitely keep user data


FIRST thing it does: mount the old OS drive and delete all the old OS files.

SECOND thing it does: unmounts old OS drive

THIRD thing it does: tries to remount old root (which it mounted before, I note) as ext2 file system. BUT if it's NOT ext2...


solarbird: (molly-braceforimpact)
We've just had two Stage 3 power events (multi-second complete outages) and come back from both. The first was a good 30 seconds, and I was headed to the server room to bring the murknet down. Then back up. The second one was a couple of minutes later, down seven seconds and back up. THIS IS FREAKISH AND NEVER HAPPENS, so nice re-routin' there, PSE.

But there is no way we are staying up through this storm. XD
solarbird: (molly-thats-not-good-green)


Debian, Ubuntu vulnerable too. Patches are up as of this morning. Get yours now!
solarbird: (molly-angry)

We’ve had to disable greylisting on our mail server, because ever since the latest round of security updates we loaded over the weekend, every dkim-using host in the world fails key retrieval at milter-greylist, and we don’t get mail from google or twitter or yahoo or much of anybody large anymore.

And there’s no way to just disable dkim check in milter-greylist.

Anybody have any idea what the fuck might have happened? Searching online finds me exactly nothing. Here’s a sample – every transaction involving DKIM-signed mail fails, every time, and it started at the weekend round of security patches:

Jan 25 23:31:25 newmoon sm-mta[978]: u0Q7VOMi000978: from=<ZZZZZZZZ@gmail.com>, size=2334, class=0, nrcpts=1, msgid=<CAAsYJfyDCB0w3uKXjie-uXF_Xskt524MuKU4=HHckYMkeDKZQg@mail.gmail.com>, proto=ESMTP, daemon=MTA, relay=mail-pf0-f179.google.com []
Jan 25 23:31:25 newmoon milter-greylist: DKIM failed: Key retrieval failed
Jan 25 23:31:25 newmoon sm-mta[978]: u0Q7VOMi000978: Milter: data, reject=451 4.3.2 Please try again later
Jan 25 23:31:25 newmoon sm-mta[978]: u0Q7VOMi000978: to=<YYYYYYYY@murkworks.net>, delay=00:00:00, pri=32334, stat=Please try again later

Mirrored from Crime and the Blog of Evil. Come check out our music at:
Bandcamp (full album streaming) | Videos | iTunes | Amazon | CD Baby

solarbird: (molly-kill-everyone-with-sticks)
So far, I have to say, Debian Jessie is a trainwreck. Certainly as a mail server.

Current dovecot likes DELETING 90% OF YOUR MAIL ON FIRST ENCOUNTER. Gone. Poof. SO LONG. Something between the version we had been running and this version changed something about index numbers, and it's supposed to fix that silently, and the second time you show it your old mail queue it does. But I sure hope you kept that backup, because the first time, it just DELETES 90% OF YOUR MAIL.

Or in mine and Anna's cases, 100% of your mail. Seems to depend upon where the index conflict occurs. It seems to let you keep as much as 10% of it tho' if you have enough mail in your inbox.

And milter-greylist, holy hell.

So it's not just that the init.d startup for milter-greylist doesn't work, and it's not just that it fails silently with a false success report, and it's not just that it's wrong, I mean, just, completely, wrong, as in the environment variables are set to different values than the defaults in the default-for-senamil config file, and hey! They! Won't! Work! Together!, it's 1) it appears to be some sort of screen for whatever is actually silently failing to start milter-greylist, because 2) when you can manage to get it to produce any output at all, it's clearly not generated by anything in the startup script itself, and 3) if you edit out code that does things like "check for this being disabled in defaults because your check is ignoring the actual data and reporting disabled when it's not," some version of that code somewhere still runs even though you have deleted it, and still comes up with the wrong result without fail.

And! AND! Even if you start it yourself, as root, and it runs for a little while just fine? As soon as it tries to write the database file out to disk the first time, it crashes. Dead.

Open Source is the Future of Pain
solarbird: (molly-thats-not-good-green)
MURKWORKS.NET USERS: newmoon, the mail server, is about to go down for critical unscheduled maintenance. No mail will be sent or received while we are down. We will most certainly be down the rest of the day. All other services should continue unaffected.
solarbird: (banzai institute)

Huh, that's neat. The wireless hub adds about 3ms to ping times, given the same number of wired routers/switches involved. (Wireless is an extra, third stage for those tests.)

All pings are against an idle Mac backup server. Middle numbers are wired from a really old, like, 1995ish, Pentium Linux box; top is from a 2011 Macbook Pro where I started poking at it; slightly improved numbers in the bottom column are a redo from a location with better wifi signal strength. Better signal did help, but not as much as I expected.

Initial Wireless - 2011 MacBook Pro (Low Signal)

Wired connection from 1995 Pentium

Second Wireless - 2011 MacBook Pro (Improved Signal)

April 2019

7 8910111213

Most Popular Tags


RSS Atom