A bit of history…
I started my computing career at NSFNET at the end of 1991. Which then became ANSnet. In those days, we had a home-brewed network monitoring system. I believe most/all of it was originally the brainchild of Bill Norton. Later there were several contributors; Linda Liebengood, myself, others. The important thing for today’s thoughts: it was named “rover”, and its user interface philosophy was simple but important: “Only show me actionable problems, and do it as quickly as possible.”
To understand this philosophy, you have to know something about the primary users: the network operators in the Network Operations Center (NOC). One of their many jobs was to observe problems, perform initial triage, and document their observations in a trouble ticket. From there they might fix the problem, escalate to network engineering, etc. But it wasn’t expected that we’d have some omniscient tool that could give them all of the data they (or anyone else) needed to resolve the problem. We expected everyone to use their brains, and we wanted our primary problem reporter to be fast and as clutter-free as possible.
For decades now, I’m spent a considerable amount of time working at home. Sometimes because I was officially telecommuting, at other times just because I love my work and burn midnight hours doing it. As a result, my home setup has become more complex over time. I have 10 gigabit ethernet throughout the house (some fiber, some Cat6A). I have multiple 10 gigabit ethernet switches, all managed. I have three rackmount computers in the basement that run 7×24. I have ZFS pools on two of them, used for nightly backups of all networked machines, source code repository redundancy, Time Machine for my macOS machines, etc. I run my own DHCP service, an internal DNS server, web servers, an internal mail server, my own automated security software to keep my pf tables current, Unifi, etc. I have a handful of Raspberry Pis doing various things. Then there’s all the other devices: desktop computers in my office, a networked laser printer, Roku, AppleTV, Android TV, Nest thermostat, Nest Protects, WiFi access points, laptops, tablet, phone, watch, Ooma, etc. And the list grows over time.
Essentially, my home has become somewhat complex. Without automation, I spend too much time checking the state of things or just being anxious about not having time to check everything at a reasonable frequency. Are my ZFS pools all healthy? Are all of my storage devices healthy? Am I running out of storage space anywhere? Is my DNS service working? Is my DHCP server working? My web server? NFS working where I need it? Is my Raspberry Pi garage door opener working? Are my domains resolvable from the outside world? Are the cloud services I use working? Is my Internet connection down? Is there a guest on my network? A bandit on my network? Is my printer alive? Is my internal mail service working? Are any of my UPS units running on battery? Are there network services running that should not be? What about the ones that should be, like sshd?
I needed a monitoring system that worked like rover; only show me actionable issues. So I wrote my own, and named it “mcrover”. It’s more of a host and service monitoring system than a network monitoring system, but it’s distributed and secure (using ed25519 stuff in libDwmAuth). It’s modern C++, relatively easy to extend, and has some fun bits (ASCII art in the curses client when there are no alerts, for example). Like the old Network Operations Center, I have a dedicated display in my office that only displays the mcrover Qt client, 24 hours a day. Since most of the time there are no alerts to display, the Qt client toggles between a display of the next week’s forecast and a weather radar image when there are no alerts. If there are alerts, the alert display will be shown instead, and will not go away until there are no alerts (or I click on the page switch in the UI). The dedicated display is driven by a Raspberry Pi 4B running the Qt client from boot, using EGLFS (no X11). The Raspberry Pi4 is powered via PoE. It is also running the mcrover service, to monitor local services on the Pi as well as many network services. In fact the mcrover service is running on every 7×24 general purpose computing device. mcrover instances can exchange alerts, hence I only need to look at one instance to see what’s being reported by all instances.
This has alleviated me of a lot of sys admin and network admin drudgery. It wasn’t trivial to implement, mostly due to the variety (not the quantity) of things it’s monitoring. But it has proven itself very worthwhile. I’ve been running it for many months now, and I no longer get anxious about not always keeping up with things like daily/weekly/monthly mail from cron and manually checking things. All critical (and some non-critical) things are now being checked every 60 seconds, and I only have my attention stolen when there is an actionable issue found by mcrover.
So… an ode to the philosophy of an old system. Don’t make me plow through a bunch of data to find the things I need to address. I’ll do that when there’s a problem, not when there isn’t a problem. For 7×24 general purpose computing devices running Linux, macOS or FreeBSD, I install and run the mcrover service and connect it to the mesh. And it requires very little oomph; it runs just fine on a Raspberry Pi 3 or 4.
So why the weather display? It’s just useful to me, particularly in the mowing season where I need to plan ahead for yard work. And I’ve just grown tired of the weather websites. Most are loaded with ads and clutter. All of them are tracking us. Why not just pull the data from tax-funded sources in JSON form and do it myself? I’ve got a dedicated display which doesn’t have any alerts to display most of the time, so it made sense to put it there.
The Qt client using X11, showing the weather forecast.
The Qt client using X11, showing the weather radar.
The curses client, showing ASCII art since there are no alerts to be shown.