An ode to NSFNET and ANSnet: a simple NMS for home

A bit of history…

I started my computing career at NSFNET at the end of 1991. Which then became ANSnet. In those days, we had a home-brewed network monitoring system. I believe most/all of it was originally the brainchild of Bill Norton. Later there were several contributors; Linda Liebengood, myself, others. The important thing for today’s thoughts: it was named “rover”, and its user interface philosophy was simple but important: “Only show me actionable problems, and do it as quickly as possible.”

To understand this philosophy, you have to know something about the primary users: the network operators in the Network Operations Center (NOC). One of their many jobs was to observe problems, perform initial triage, and document their observations in a trouble ticket. From there they might fix the problem, escalate to network engineering, etc. But it wasn’t expected that we’d have some omniscient tool that could give them all of the data they (or anyone else) needed to resolve the problem. We expected everyone to use their brains, and we wanted our primary problem reporter to be fast and as clutter-free as possible.

For decades now, I’m spent a considerable amount of time working at home. Sometimes because I was officially telecommuting, at other times just because I love my work and burn midnight hours doing it. As a result, my home setup has become more complex over time. I have 10 gigabit ethernet throughout the house (some fiber, some Cat6A).  I have multiple 10 gigabit ethernet switches, all managed.  I have three rackmount computers in the basement that run 7×24.  I have ZFS pools on two of them, used for nightly backups of all networked machines, source code repository redundancy, Time Machine for my macOS machines, etc.  I run my own DHCP service, an internal DNS server, web servers, an internal mail server, my own automated security software to keep my pf tables current, Unifi, etc.  I have a handful of Raspberry Pis doing various things.  Then there’s all the other devices: desktop computers in my office, a networked laser printer, Roku, AppleTV, Android TV, Nest thermostat, Nest Protects, WiFi access points, laptops, tablet, phone, watch, Ooma, etc.  And the list grows over time.

Essentially, my home has become somewhat complex.  Without automation, I spend too much time checking the state of things or just being anxious about not having time to check everything at a reasonable frequency.  Are my ZFS pools all healthy?  Are all of my storage devices healthy?  Am I running out of storage space anywhere?  Is my DNS service working?  Is my DHCP server working?  My web server?  NFS working where I need it?  Is my Raspberry Pi garage door opener working?  Are my domains resolvable from the outside world?  Are the cloud services I use working?  Is my Internet connection down?  Is there a guest on my network?  A bandit on my network?  Is my printer alive?  Is my internal mail service working?  Are any of my UPS units running on battery?  Are there network services running that should not be?  What about the ones that should be, like sshd?

I needed a monitoring system that worked like rover; only show me actionable issues.  So I wrote my own, and named it “mcrover”.  It’s more of a host and service monitoring system than a network monitoring system, but it’s distributed and secure (using ed25519 stuff in libDwmAuth).  It’s modern C++, relatively easy to extend, and has some fun bits (ASCII art in the curses client when there are no alerts, for example).  Like the old Network Operations Center, I have a dedicated display in my office that only displays the mcrover Qt client, 24 hours a day.  Since most of the time there are no alerts to display, the Qt client toggles between a display of the next week’s forecast and a weather radar image when there are no alerts.  If there are alerts, the alert display will be shown instead, and will not go away until there are no alerts (or I click on the page switch in the UI).  The dedicated display is driven by a Raspberry Pi 4B running the Qt client from boot, using EGLFS (no X11).  The Raspberry Pi4 is powered via PoE.  It is also running the mcrover service, to monitor local services on the Pi as well as many network services.  In fact the mcrover service is running on every 7×24 general purpose computing device.  mcrover instances can exchange alerts, hence I only need to look at one instance to see what’s being reported by all instances.

This has alleviated me of a lot of sys admin and network admin drudgery.  It wasn’t trivial to implement, mostly due to the variety (not the quantity) of things it’s monitoring.  But it has proven itself very worthwhile.  I’ve been running it for many months now, and I no longer get anxious about not always keeping up with things like daily/weekly/monthly mail from cron and manually checking things.  All critical (and some non-critical) things are now being checked every 60 seconds, and I only have my attention stolen when there is an actionable issue found by mcrover.

So… an ode to the philosophy of an old system.  Don’t make me plow through a bunch of data to find the things I need to address.  I’ll do that when there’s a problem, not when there isn’t a problem.  For 7×24 general purpose computing devices running Linux, macOS or FreeBSD, I install and run the mcrover service and connect it to the mesh.  And it requires very little oomph; it runs just fine on a Raspberry Pi 3 or 4.

So why the weather display?  It’s just useful to me, particularly in the mowing season where I need to plan ahead for yard work.  And I’ve just grown tired of the weather websites.  Most are loaded with ads and clutter.  All of them are tracking us.  Why not just pull the data from tax-funded sources in JSON form and do it myself?  I’ve got a dedicated display which doesn’t have any alerts to display most of the time, so it made sense to put it there.

The Qt client using X11, showing the weather forecast.

mcrover Qt client using X11, showing the weather forecast

The Qt client using X11, showing the weather radar.

mcrover Qt client using X11, showing the weather radar

The curses client, showing ASCII art since there are no alerts to be shown.

mcrover curses client with no alerts.

mcblockd’s latest trick works: drop TCP connections

Evidence in the logs of mcblockd’s latest feature working. It’s successfully killing TCP connections when it adds a prefix to one of the pf tables.

Apr 29 03:42:40 ria mcblockd: [I] Dropped TCP connection from 221.144.5.116:38440
Apr 29 03:42:40 ria mcblockd: [I] Added 221.144/12 (KR) to ssh_losers for 180 days
Apr 29 05:02:02 ria mcblockd: [I] Dropped TCP connection from 46.118.248.195:40294
Apr 29 05:02:02 ria mcblockd: [I] Added 46.118/15 (UA) to ssh_losers for 180 days
Apr 29 07:07:42 ria mcblockd: [I] Dropped TCP connection from 120.132.4.45:56388
Apr 29 07:07:42 ria mcblockd: [I] Added 120.128/13 (CN) to ssh_losers for 180 days
Apr 29 10:04:23 ria mcblockd: [I] Dropped TCP connection from 95.215.2.52:50862
Apr 29 10:04:23 ria mcblockd: [I] Added 95.215.0/22 (RU) to ssh_losers for 180 days
Apr 29 11:51:34 ria mcblockd: [I] Dropped TCP connection from 110.246.84.64:32309
Apr 29 11:51:34 ria mcblockd: [I] Added 110.240/12 (CN) to ssh_losers for 180 days
Apr 29 12:22:42 ria mcblockd: [I] Dropped TCP connection from 183.184.133.58:3369
Apr 29 12:22:42 ria mcblockd: [I] Added 183.184/13 (CN) to ssh_losers for 180 days
Apr 29 13:13:54 ria mcblockd: [I] Dropped TCP connection from 120.150.231.99:50357
Apr 29 13:13:54 ria mcblockd: [I] Dropped TCP connection from 120.150.231.99:50349
Apr 29 13:13:54 ria mcblockd: [I] Added 120.144/12 (AU) to ssh_losers for 180 days
Apr 29 14:42:30 ria mcblockd: [I] Dropped TCP connection from 113.209.68.135:53280
Apr 29 14:42:30 ria mcblockd: [I] Added 113.209/16 (CN) to ssh_losers for 180 days

mcblockd’s latest tricks: kill pf state, walk PCB list and kill TCP connections

Today I added a new feature to mcblockd to kill pf state for all hosts in a prefix when the prefix is added to one of my pf tables. This isn’t exactly what I want, but it’ll do for now.

mcblockd also now walks the PCB (protocol control block) list and drops TCP connections for hosts in a prefix I’ve just added to a pf table. Fortunately there was sample code in /usr/src/usr.sbin/tcpdrop/tcpdrop.c. The trick here is that I don’t currently have a means of mapping a pf table to where it’s applied (which ports, which interfaces). In the long term I might add code to figure that out, but in the interim I can configure ports and interfaces in mcblockd’s configuration file that will allow me to drop specific connections. For this first pass, I just toast all PCBs for a prefix.

The reason I added this feature: I occasionally see simultaneous login attempts from different IP addresses in the same prefix. If I’m going to block the prefix automatically, I want to cut off all of their connections right now, not after all of their connections have ended. Blowing away their pf state works, but leaves a hanging TCP connection in the ESTABLISHED state for a while. I want the PCBs to be cleaned up.

more on mcblockd automation progress

Similar to what I have for sshd, I have real time log processing on my web server. The secure remote communication with mcblockd is very nice to have, since my web server is a separate machine from my gateway/firewall. Below you can see offending web server log entries followed immediately by an action from mcblockd. Instant blocking, without my involvement.

185.36.102.114 - [20/Apr/2017:19:08:08] "GET /blog/xmlrpc.php HTTP/1.0" 200 42
Apr 20 19:08:09 ria mcblockd: [I] Added 185.36.100/22 (CZ) to www_losers for 30 days

191.101.117.226 - [20/Apr/2017:19:57:52] "POST /blog/xmlrpc.php HTTP/1.1" 500 -
Apr 20 19:57:52 ria mcblockd: [I] Added 191.101/16 (CL) to www_losers for 90 days

5.164.231.83 - [20/Apr/2017:20:12:07] "GET /blog/xmlrpc.php HTTP/1.0" 200 42
Apr 20 20:12:08 ria mcblockd: [I] Added 5.164/14 (RU) to www_losers for 90 days

160.202.162.204 - [22/Apr/2017:21:59:24] "GET /wp-login.php HTTP/1.1" 404 210
Apr 22 21:59:24 ria mcblockd: [I] Added 160.202.160/22 (KR) to www_losers for 90 days

104.173.193.176 - [23/Apr/2017:00:58:00] "GET /wp-login.php HTTP/1.1" 404 210
Apr 23 00:58:00 ria mcblockd: [I] Added 104.173.193/24 (US) to www_losers for 30 days

191.37.7.186 - [23/Apr/2017:04:18:19] "GET /wp-login.php HTTP/1.1" 404 210
Apr 23 04:18:19 ria mcblockd: [I] Added 191.37.0/17 (BR) to www_losers for 90 days

103.229.124.123 - [23/Apr/2017:07:50:15] "GET /xmlrpc.php HTTP/1.1" 404 208
Apr 23 07:50:15 ria mcblockd: [I] Added 103.229.124/22 (TW) to www_losers for 30 days

61.77.12.200 - [23/Apr/2017:09:40:35] "GET /wp-login.php HTTP/1.1" 404 210
Apr 23 09:40:36 ria mcblockd: [I] Added 61.72/13 (KR) to www_losers for 90 days

46.161.9.14 - [23/Apr/2017:10:30:24] "GET /blog/xmlrpc.php HTTP/1.0" 405 42
Apr 23 10:30:24 ria mcblockd: [I] Added 46.161.0/18 (RU) to www_losers for 90 days

And yes, the threshold policy code works fine. Below is the result of someone trying to log in 5 times over a period of about 26 minutes. Since I have the threshold set to 5 times in 30 days, they were way above the threshold, but this would be considered a ‘slow’ attempt by some measures.

Apr 21 17:08:59 ria mcblockd: [I] Pending 69.162.73/24 (US) for ssh_losers, 1/5
Apr 21 17:08:59 ria mcblockd: [I] Pending 69.162.73/24 (US) for ssh_losers, 2/5
Apr 21 17:22:14 ria mcblockd: [I] Pending 69.162.73/24 (US) for ssh_losers, 3/5
Apr 21 17:22:14 ria mcblockd: [I] Pending 69.162.73/24 (US) for ssh_losers, 4/5
Apr 21 17:35:21 ria mcblockd: [I] Added 69.162.73/24 (US) to ssh_losers for 30 days

And another over a period of about 91 minutes:

Apr 23 01:39:43 ria mcblockd: [I] Pending 64.179.211/24 (CA) for ssh_losers, 1/5
Apr 23 01:39:43 ria mcblockd: [I] Pending 64.179.211/24 (CA) for ssh_losers, 2/5
Apr 23 02:25:48 ria mcblockd: [I] Pending 64.179.211/24 (CA) for ssh_losers, 3/5
Apr 23 02:25:48 ria mcblockd: [I] Pending 64.179.211/24 (CA) for ssh_losers, 4/5
Apr 23 03:11:15 ria mcblockd: [I] Added 64.179.211/24 (CA) to ssh_losers for 30 days

Raspberry Pi garage door opener: part 9 (done)

Not much to say here. I’ve been using the garage door opener for many months and it just works and is very stable.

dwm@pi1:/home/dwm% uptime
 3:33AM  up 123 days,  4:25, 1 users, load averages: 0.40, 0.15, 0.10

dwm@pi1:/home/dwm% psg mcpigdod
USER  PID  %CPU %MEM   VSZ   RSS TT  STAT STARTED        TIME COMMAND
dwm   930   0.0  1.2 46452 11372  0- S    22Aug16  1748:10.15 mcpigdod

Raspberry Pi garage door opener: part 8

On Wednesday night I stuffed the enclosure with the Raspberry Pi, buttons, indicators and POE splitter after making all of the internal connections. I assembled the second Neutrik dataCON on the second rotary encoder. I temporarily taped my enclosure to the garage wall for testing, and connected the rotary encoders, door activation wires and the POE connection. I also attached the second magnetic door switch to the wall above the south door, and attached the magnet to the top of the door. I then did some basic testing. Both doors work correctly via the web app from my iPhone, and the rotary encoder connections work correctly.

On Thursday night I extended the wiring for the magnetic door switches (soldered joints and heat shrink), then sleeved the extensions with gray braided sleeve. Since I’m still waiting for a Neutrik jack for these, I’m temporarily using a dual row barrier strip to connect them to my PCB inside my enclosure.

Raspberry Pi garage door opener: part 7

I received my HAT PCBs that I designed for the garage door opener. I populated one of them and tested all of the outputs as well as the door closed switch inputs. Everything works. Yay! I will continue assembly tomorrow, and possibly test it wired into the garage doors.

Raspberry Pi garage door opener: Part 6

Today I connected the wiring for door activation from the garage door openers to the new screw terminal keystone jacks in the new wall plate in the garage. I also connected the cat5e to the yellow keystone jack. I then installed the new wall plate (stainless steel) into the new wall box. It looks clean and tidy, and I tested the door activation wiring.

I fabricated the remaining part of the rotary encoder mounts from electrial grade fiberglass angle. Both of the rotary encoders are now mounted.

I terminated the POE connection in my Leviton structured media enclosure in the basement. The jack is a yellow Leviton QuickPort, to identify it as needing POE. I used my Rhino labeler to put a heat shrink label on the cat5e cable before I punched it down on the jack. The jack is in a new Leviton 12-port jack panel that I bought to keep my POE jacks separate from non-POE jacks.

I installed the new POE switch in my rack in the basement. I then assembled a short cat5e patch cable and connected the POE switch to my main switch. I then connected one of the POE ports of the new switch to the new jack that leads to the wall plate in the garage. I connected my Raspberry Pi in the garage with a POE splitter. It works fine.

I drilled two more holes in the enclosure for the Raspberry Pi, and installed cable glands. One is for the wires to activate the garage doors, the other is for the cat5e cable. I haven’t decided how I’m going to connect the door switches yet. I’m leaning toward using a single Neutrik speakON 4-pole connector.

Raspberry Pi garage door opener: Part 5

Tonight I finished running cat5e from the new wall plate box in the garage to the basement. This was a difficult, sweaty job climbing around on trusses in the attic with fish tape (it was over 95F during the day today). But it’s done. I will terminate the ends with new jacks tomorrow.

Over this week I did some work on the enclosure for the Raspberry Pi, my HAT, buttons, indicators, POE splitter and jacks. I installed two Neutrik etherCON jacks in the enclosure for the rotary encoders, since those are on the top of the enclosure and I want to keep dust out of the connection. Fortunately the rotary encoder wires are correctly sized to use with a crimped RJ45. The door activation buttons and door status indicator LEDs are installed in the front cover. Everything appears to fit, though I will not receive my custom PCBs until Monday and hence can’t assemble the whole thing until next week.

I am also waiting on some cable glands, keystone inserts for the door activation wiring and the rotary encoder mounting piece I designed (which I ordered from Front Panel Express).

Raspberry Pi garage door opener: Part 4

I ordered brackets of my design from Front Panel Express to mount the rotary encoders.

I ordered Lovejoy couplings and fasteners from McMaster-Carr to connect the rotary encoders to the garage door shafts.

I ordered a Netgear ProSAFE JGS516PE 16-Port Gigabit Rackmount PoE switch with 8 PoE ports (85w total). I’ve been needing a PoE switch for a while, since I want to install a couple of PoE IP cameras. I also ordered 1000′ of yellow Cat5e cable to use for PoE applications. This will make it easy to identify ethernet cables that have PoE in my home, since my others are blue, grey or white. Finally, I ordered a PoE splitter with 5V microUSB output to power the Raspberry Pi.