mcrover updates

I spent some weekend time updating mcrover this month. I had two motivations:

  1. I wanted deeper tests for two web applications: WordPress and piwigo. Plex will be on the list soon, I’m just not thrilled that their API is XML-based (Xerces is a significant dependency, and other XML libraries are not as robust or complete).
  2. I wanted a test for servers using libDwmCredence. Today that’s mcroverd and mcweatherd. dwmrdapd and mcblockd will soon be transitioned from libDwmAuth to libDwmCredence.

I created a web application alert, which allows me to create code to test web applications fairly easily and generically. Obviously, the test for a given application is application-specific. Applications with REST interfaces that emit JSON are pretty easy to add.

The alerts and code to check servers using libDwmCredence allow me to attempt connection and authentication to any service using Dwm::Credence::Peer. These are the steps that establish a connection; they are the prologue to application messaging. This allows me to check the status of a service using Dwm::Credence::Peer in lieu of a deeper (application-specific) check, so I can monitor new services very early in the deployment cycle.

Time for a new desktop keyboard: the switches

I just ordered a set of Kailh Box Pink key switches from novelkeys.xyz. And I’m going to order a barebones hot swap keyboard. Why?

First, I spilled some coffee in my desktop keyboard. Maybe 1/4 cup. Which is a lot, though nowhere near what I’ve spilled on some of my buckling spring keyboards. I tore it down, doused it with 99% isopropyl alcohol, cleaned all the switches, the PCB (both sides), the case (inside and out), the keycaps. It’s working again, but it’s a reminder….

It’s my oldest CODE keyboard, Cherry MX blue switches. I like the CODE keyboard. I don’t love the price though, and I also don’t love how difficult it is to disassemble. I’ve had it apart a couple of times. And this time, one of the case tabs broke. It’s not a huge issue, but it’s annoying. If I press on the lower right corner of the keyboard case… “SQUEAK”. It’s likely more the matter that I know it’s broken than it is any sort of real annoyance, but…

I’ve never loved Cherry MX switches. My computing life began when we had some truly delightful keyboards to type on. Before the IBM Model M (very good). Even before the IBM Model F (better). Anyone truly interested in how we got where we are today would do well to do some history homework online before making your next mechanical keyboard purchase. But I can sum it up in two words: cost reductions.

I’m not going to judge that. It is what it is, and the good far outweighs the bad; cost reductions are what have allowed many of us to own desktop and laptop computers for many decades.

But… there are those of us that spend our lives at our keyboards because it’s our livelihood. And there are a LOT of us. And some of us care deeply about our human-machine interface that we spend 8+ hours at each day. And guess what? We’re not all the same. Unfortunately, we’ve mostly been saddled with only two predominant key switches for keyboards for a very long time now: various rubber dome keyboards (pretty much universally considered crummy but inexpensive), and those with Cherry MX (or something similar to Cherry MX). We do still have Unicomp making buckling spring keyboards with tooling from Lexmark (who manufactured the Model M keyboards for the North American market). And we have some new switch types (optical, hall effect, etc.). But at the moment, the keyboard world is predominantly Cherry colored.

Perhaps worse, those of us that like a switch to be both tactile and clicky have few good choices. Unicomp buckling spring is at the top for readily available and reasonably priced. But the compromises for a modern desktop are significant for a large crowd of us. Number one would be that it’s huge (including the SSK version). For some, number two would be no backlighting. And yet others want more keycap options. But it’s a long drop from the buckling spring to any MX-style switch if your goal is clicky and tactile.

I don’t hate Cherry MX blues. Nor Gateron blues. Nor many others. But… most of them feel like just what they’re designed to be. They’re not smooth (including the linears), most of them are not truly tactile, and they’re fragile (not protected from dust or liquid ingress). Most have considerable key wobble. They’re usable, I’ve just been wanting something better for a while. In a TKL (tenkeyless) keyboard with minimal footprint.

Some personal history… one of the reasons I stopped using buckling spring was just the sheer size of any of my true Model M keyboards or any of the Unicomps. The other was the activation force. I wanted something a little lighter to the touch, but still noisy. The Cherry MX blue and similar have filled the niche for me for a while. But… the scratchiness/crunchiness has always been less than ideal to me, and the sound in any board I’ve tried has been less satisfying than I’d like. I’ve not had any of the switches die on me, which is a testament to durability. But I’ve had to clean them on more than one occasion due to a small spill, to get them working again. And over time, despite the fact that they still function, their characteristics change. Some keys lose some of their clickiness. Some get louder. And out of the box, they’re not terribly consistent sound-wise. And while I’ve disassembled keyboards to clean and lube the switches… it’s very time consuming. And despite the fact that I have a pretty good hot-air rework setup, it’s very hard for me to justify spending time replacing soldered switches. I can barely justify time swapping hot-swap keys!

So… I want a more durable switch. And something smoother (less scratch/crunch) than a Cherry MX, but with a nice distinct click. And unfortunately, something that works in a PCB designed for Cherry MX since there are far and away the most options there. The Kailh White or Pink seem to fit the bill. The white are readily available, so I bought the pink just to make sure I don’t miss out on available inventory. I’ll put them in a hot-swap board with PBT keycaps and give them a test drive for a few weeks.

I know the downsides ahead of time. I had an adjustment to make when I went from buckling spring to Cherry MX blue. Buckling spring feedback and activation occur at the same time; it’s ideal. Cherry MX and related designs… most of them activate after the click. The Kailh pink and white appear to activate before the click, and they don’t have the hysteresis of the Cherry MX switches. But based on my own personal preferences which are aligned pretty closely to those who’ve reviewed the Kailh Box White and Kailh Box Pink (like Chyrosran on YouTube), I think one of these switches will make me happier than my MX blues.

Of course I could be wrong. But that’s why I’m going with an inexpensive hot-swap board for this test drive. PCB, mounting and chassis all play a significant role in how a keyboard feels and sounds. But I know many of those differences, and the goal at the moment is to pick the switches I want in my next long-term keyboard.

Threadripper 3960X: the birth of ‘thrip’

I recently assembled a new workstation for home. My primary need was a machine for software development, including deep learning. This machine is named “thrip”.

Having looked hard at my options, I decided on AMD Threadripper 3960X as my CPU. A primary driver was of course bang for the buck. I wanted PCIe 4.0, at least 18 cores, at least 4-channel RAM, the ability to utilize 256G or more of RAM, and to stay in budget.

By CPU core count alone, the 3960X is over what I needed. On the flip side, it’s constrained to 256G of RAM, and it’s also more difficult to keep cool than most CPUs (280W TDP). But on price-per-core, and overall performance per dollar, it was the clear winner for my needs.

Motherboard-wise, I wanted 10G ethernet, some USB-C, a reasonable number of USB-A ports, room for 2 large GPUs, robust VRM, and space for at least three NVMe M.2 drives. Thunderbolt 3 would have been nice, but none of the handful of TRX40 boards seem to officially support it (I don’t know if this is an Intel licensing issue or something else). The Gigabyte board has the header and Wendell@Level1Techs seems to have gotten it working, but I didn’t like other aspects of the Gigabyte TRX40 AORUS EXTREME board (the XL-ATX form factor, for example, is still limiting in terms of case options).

I prefer to build my own workstations. It’s not due to being particularly good at it, or winding up with something better than I could get pre-built. It’s that I enjoy the creative process of selecting parts and putting it all together.

I had not assembled a workstation in quite some time. My old i7-2700K machine has met my needs for most of the last 8 years. And due to a global pandemic, it wasn’t a great time to build a new computer. The supply chain has been troublesome for over 6 months now, especially for some specific parts (1000W and above 80+ titanium PSUs, for example). We’ve also had a huge availability problem for the current GPUs from NVIDIA (RTX 3000 series) and AMD (Radeon 6000 series). And I wasn’t thrilled about doing a custom water-cooling loop again, but I couldn’t find a worthy quiet cooling solution for Threadripper and 2080ti without going custom loop. Given the constraints, I wound up with these parts as the guts:

  • Asus TRX40 ROG Zenith II Extreme Alpha motherboard
  • AMD Threadripper 3960X CPU (24 cores)
  • 256 gigabytes G.Skill Trident Z Neo Series RGB DDR4-3200 CL16 RAM (8 x 32G)
  • EVGA RTX 2080 Ti FTW3 Ultra GPU with EK Quantum Vector FTW3 waterblock
  • Sabrent 1TB Rocket NVMe 4.0 Gen4 PCIe M.2 Internal SSD
  • Seasonic PRIME TX-850, 850W 80+ Titanium power supply
  • Watercool HEATKILLER IV PRO for Threadripper, pure copper CPU waterblock

It’s all in a Lian Li PC-O11D XL case. I have three 360mm radiators, ten Noctua 120mm PWM fans, an EK Quantum Kinetic TBE 200 D5 PWM pump, PETG tubing and a whole bunch of Bitspower fittings.

My impressions thus far: it’s fantastic for Linux software development. It’s so nice to be able to run ‘make -j40‘ on large C++ projects and have them complete in a timely manner. And thus far, it runs cool and very quiet.

An ode to NSFNET and ANSnet: a simple NMS for home

A bit of history…

I started my computing career at NSFNET at the end of 1991. Which then became ANSnet. In those days, we had a home-brewed network monitoring system. I believe most/all of it was originally the brainchild of Bill Norton. Later there were several contributors; Linda Liebengood, myself, others. The important thing for today’s thoughts: it was named “rover”, and its user interface philosophy was simple but important: “Only show me actionable problems, and do it as quickly as possible.”

To understand this philosophy, you have to know something about the primary users: the network operators in the Network Operations Center (NOC). One of their many jobs was to observe problems, perform initial triage, and document their observations in a trouble ticket. From there they might fix the problem, escalate to network engineering, etc. But it wasn’t expected that we’d have some omniscient tool that could give them all of the data they (or anyone else) needed to resolve the problem. We expected everyone to use their brains, and we wanted our primary problem reporter to be fast and as clutter-free as possible.

For decades now, I’m spent a considerable amount of time working at home. Sometimes because I was officially telecommuting, at other times just because I love my work and burn midnight hours doing it. As a result, my home setup has become more complex over time. I have 10 gigabit ethernet throughout the house (some fiber, some Cat6A).  I have multiple 10 gigabit ethernet switches, all managed.  I have three rackmount computers in the basement that run 7×24.  I have ZFS pools on two of them, used for nightly backups of all networked machines, source code repository redundancy, Time Machine for my macOS machines, etc.  I run my own DHCP service, an internal DNS server, web servers, an internal mail server, my own automated security software to keep my pf tables current, Unifi, etc.  I have a handful of Raspberry Pis doing various things.  Then there’s all the other devices: desktop computers in my office, a networked laser printer, Roku, AppleTV, Android TV, Nest thermostat, Nest Protects, WiFi access points, laptops, tablet, phone, watch, Ooma, etc.  And the list grows over time.

Essentially, my home has become somewhat complex.  Without automation, I spend too much time checking the state of things or just being anxious about not having time to check everything at a reasonable frequency.  Are my ZFS pools all healthy?  Are all of my storage devices healthy?  Am I running out of storage space anywhere?  Is my DNS service working?  Is my DHCP server working?  My web server?  NFS working where I need it?  Is my Raspberry Pi garage door opener working?  Are my domains resolvable from the outside world?  Are the cloud services I use working?  Is my Internet connection down?  Is there a guest on my network?  A bandit on my network?  Is my printer alive?  Is my internal mail service working?  Are any of my UPS units running on battery?  Are there network services running that should not be?  What about the ones that should be, like sshd?

I needed a monitoring system that worked like rover; only show me actionable issues.  So I wrote my own, and named it “mcrover”.  It’s more of a host and service monitoring system than a network monitoring system, but it’s distributed and secure (using ed25519 stuff in libDwmAuth).  It’s modern C++, relatively easy to extend, and has some fun bits (ASCII art in the curses client when there are no alerts, for example).  Like the old Network Operations Center, I have a dedicated display in my office that only displays the mcrover Qt client, 24 hours a day.  Since most of the time there are no alerts to display, the Qt client toggles between a display of the next week’s forecast and a weather radar image when there are no alerts.  If there are alerts, the alert display will be shown instead, and will not go away until there are no alerts (or I click on the page switch in the UI).  The dedicated display is driven by a Raspberry Pi 4B running the Qt client from boot, using EGLFS (no X11).  The Raspberry Pi4 is powered via PoE.  It is also running the mcrover service, to monitor local services on the Pi as well as many network services.  In fact the mcrover service is running on every 7×24 general purpose computing device.  mcrover instances can exchange alerts, hence I only need to look at one instance to see what’s being reported by all instances.

This has alleviated me of a lot of sys admin and network admin drudgery.  It wasn’t trivial to implement, mostly due to the variety (not the quantity) of things it’s monitoring.  But it has proven itself very worthwhile.  I’ve been running it for many months now, and I no longer get anxious about not always keeping up with things like daily/weekly/monthly mail from cron and manually checking things.  All critical (and some non-critical) things are now being checked every 60 seconds, and I only have my attention stolen when there is an actionable issue found by mcrover.

So… an ode to the philosophy of an old system.  Don’t make me plow through a bunch of data to find the things I need to address.  I’ll do that when there’s a problem, not when there isn’t a problem.  For 7×24 general purpose computing devices running Linux, macOS or FreeBSD, I install and run the mcrover service and connect it to the mesh.  And it requires very little oomph; it runs just fine on a Raspberry Pi 3 or 4.

So why the weather display?  It’s just useful to me, particularly in the mowing season where I need to plan ahead for yard work.  And I’ve just grown tired of the weather websites.  Most are loaded with ads and clutter.  All of them are tracking us.  Why not just pull the data from tax-funded sources in JSON form and do it myself?  I’ve got a dedicated display which doesn’t have any alerts to display most of the time, so it made sense to put it there.

The Qt client using X11, showing the weather forecast.

mcrover Qt client using X11, showing the weather forecast

The Qt client using EGLFS, showing the weather radar.

The curses client, showing ASCII art since there are no alerts to be shown.

mcrover curses client with no alerts.

Apple M1 thoughts

Apple silicon has arrived for the Mac. Not in my hands, but it has arrived.

My thoughts…

Wow. I’m hesitant to call it revolutionary, simply because they’ve been on this path for over a decade. But I’m wowed for a number of reasons.

From the benchmarks I’ve seen, as well as the reviews, the performance isn’t what has wowed me. Yes, it’s impressive. But we had seen enough from iPhones to iPad Pros to know full well what we could expect from Apple’s first generation of their own SoC for the Mac. And they hit the marks.

I think what had the most profound impact on me was just the simple fact that they delivered on a promise to themselves, their users and their company. This wasn’t a short road! In this day and age, there are almost no technology companies that can stick the landing on a 10-year roadmap. Heck, many tech companies abandon their products and users in the span of a few years. Apple quietly persevered. They didn’t fall prey to hubris and conceit. They didn’t give us empty promises. They kept plugging away behind the scenes while Intel and others floundered, or overpromised and underdelivered, or just believed that the x86 architecture would be king forever. And much of this work happened after the passing of Steve Jobs. So to those who thought Apple would flounder without him… I think you’ve been wrong all along.

It’s not like I didn’t see this coming; it’s been rumored for what seems like forever. But I hadn’t really reflected on the potential impact until it arrived. Some background…

I’m a Mac user, and I love macOS. But I’m a software developer, and the main reason I love macOS is that it’s a UNIX.  I like the user interface more than any other, but I spend most of my time in a terminal window running emacs, clang++, etc.  Tasks served well by any UNIX. For me, macOS has been the best of two worlds. I shunned OS 9; I loved the Mac, but OS 9 fell short of my needs. When OS X arrived, I was on board. Finally an OS I could use for my work AND heartily recommend to non-techies. And the things I liked about NeXT came along for the ride.

The other reason I’ve loved Macs: the quality of Apple laptops has been exceptional for a very long time. With the exception of the butterfly keyboard fiasco and the still-mostly-useless Touch Bar (function keys are WAY more useful for a programmer), I’ve been very happy with my Mac laptops. Literally nothing else on the market has met my needs as well as a Macbook Pro, going back longer than I can remember.

But now… wow. Apple just put a stake in the ground that’s literally many miles ahead of everyone else in the personal computing space. It’s akin to the Apollo moon landing. We all saw it coming, but now the proof has arrived.

To be clear, the current M1 lineup doesn’t fit my needs. I’m not in the market for a Macbook Air or a 13″ Macbook Pro. I need a screen larger than 13″, and some of my development needs don’t fit a 16G RAM limitation, which also rules out the M1 Mac Mini (as does the lack of 10G ethernet). And like any first generation product, there are some quirks that have yet to be resolved (issues with some ultra wide monitors), missing features (no eGPU support), etc. But… for many users, these new machines are fantastic and there is literally nothing competitive. Just look at the battery life on the M1 Macbook Air and Macbook Pro 13″. Or the Geekbench scores. Or how little power they draw whether on battery or plugged into the wall. There’s no fan in the M1 Macbook Air because it doesn’t need one.

Of course, for now, I also need full x64 compatibility. I run Windows and other VMs on my Macs for development purposes, and as of right now I can’t do that on an M1 Mac. That will come if I’m to believe Parallels, but it won’t be native x64, obviously. But at least right now, Rosetta 2 looks reasonable. And it makes sense versus the original Rosetta, for a host of reasons I won’t delve into here.

Where does this leave Intel? I don’t see it as significant right now. Apple is and was a fairly small piece of Intel’s business. Today, Intel is in much bigger trouble from AMD EPYC, Threadripper, Threadripper Pro and Ryzen 3 than Apple silicon. That could change, but I don’t see Apple threatening Intel.  Apple has no products in Intel’s primary business (servers). Yes, what Apple has done is disruptive, in a good way. But the long-term impact is yet to be seen.

I am looking forward to what comes next from Apple. Something I haven’t been able to say about Intel CPUs in quite some time. Don’t get me wrong; I’m a heavy FreeBSD and Linux user as well.  Despite the age of x86/x64, we do have interesting activity here.  AMD Threadripper, EPYC and Ryzen 3 are great for many of my needs and have put significant pressure on Intel. But I believe that once Apple releases a 16″ Macbook Pro with their own silicon and enough RAM for my needs… there will literally be nothing on the market that comes even close to what I want in a laptop, for many years. It will be a solid investment.

For the long run… Apple has now finally achieved what they’ve wanted since their inception: control of their hardware and software stack across the whole product lineup. Exciting times. Real competition in the space that’s long been dominated by x86/x64, which will be good for all of us as consumers. But make no mistake: Apple’s success here isn’t easily duplicated. Their complete control over the operating system and the hardware is what has allowed them to do more (a LOT more) with less power. This has been true on mobile devices for a long time, and now Apple has brought the same synergies to bear on the PC market. As much as I appreciate Microsoft and Qualcomm SQ1 and SQ2 Surface Pro X efforts, they are far away from what Apple has achieved.

One thing that continues to befuddle me about what’s being written by some… things like “ARM is now real competition for x86/x64”.  Umm… ARM’s relevance hasn’t changed. They license reference core architectures and instruction sets. Apple is not building ARM reference architectures. If ARM was the one deserving credit here, we’d have seen similar success for Windows and Linux of ARM. ARM is relevant. But to pretend that Apple M1 silicon is just a product of ARM, and that there’s now some magic ARM silicon that’s going to go head-to-head with x86/x64 across the industry, is pure uninformed folly. M1 is a product of Apple, designed specifically for macOS and nothing else. All of the secret sauce here belongs to Apple, not ARM.

I’ve also been seeing writers say that this might prompt Microsoft and others to go the SoC route. Anything is possible. But look at how long it took Apple to get to this first generation for the Mac, and consider how they did it: mobile first, which brought unprecedented profits and many generations of experience. Those profits allowed them to bring in the talent they needed, and the very rapid growth of mobile allowed them to iterate many times in a fairly short span of time. Wash, rinse, repeat. Without the overhead of owning the fab. And for what many have considered a ‘dead’ market (personal computers). Yes, PC sales have on average been on a steady decline for some time. But the big picture is more complex; it’s still the case that a smartwatch isn’t a smartphone, a smartphone isn’t a tablet, a tablet isn’t a laptop, a laptop isn’t a desktop, most desktops are not workstations, a workstation isn’t a storage server, etc. What we’ve seen is the diversification of computing. The average consumer doesn’t need a workstation. Many don’t need a desktop, and today they have other options for their needs. But the desktop and workstation market isn’t going to disappear. We just have a lot more options to better fit our needs than we did when smartphones, tablets, ultrabooks, etc. didn’t exist.

I’ve always been uneasy with those who’ve written that Apple would abandon the PC market. The Mac business, standalone, generated 28.6 billion U.S. dollars in 2020. That would be at spot 111 on the Fortune 500 list. Not to mention that Apple and all the developers writing apps for Apple devices need Macs. The fact that Apple’s desktop business is a much smaller portion of their overall revenue isn’t a product of it being a shrinking business; it’s 4X larger in revenue than it was 20 years ago. The explosive growth in mobile has dwarfed it, but it has continued to be an area of growth for Apple. Which is not to say that I haven’t bemoaned the long delays between releases of Apple professional Mac desktops, not to mention the utter disaster of the 2013 Mac Pro. But Apple is notoriously tight-lipped about their internal work until it’s ready to ship, and it’s clear now that they wisely directed their resources at decoupling their PC fates from Intel.  None of this would have happened if Apple’s intent was to abandon personal computers.

So we enter a new era of Apple. Rejoice, whether you’re an Apple user or not. Innovation spurs further innovation.

Replaced IronWolf Pro 8TB with Ultrastar DC HC510 10TB

Due to a firmware problem in the Seagate IronWolf Pro 8TB drives that makes them incompatible with ZFS on FreeBSD, I returned them over the weekend and ordered a pair of Ultrastar DC HC510 10TB drives. I’ve had phenomenal results from Ultrastars in the past, and as near Its I can tell they’ve always been very good enterprise-grade drives regardless of the owner (IBM, Hitach, HGST, Western Digital). The Ultrastars arrived today, and I put them in the zfs1 pool:

# zpool list -v
NAME               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zfs1              16.3T  2.13T  14.2T        -         -    10%    13%  1.00x  ONLINE  -
  mirror          3.62T  1.53T  2.09T        -         -    29%    42%
    gpt/gpzfs1_0      -      -      -        -         -      -      -
    gpt/gpzfs1_1      -      -      -        -         -      -      -
  mirror          3.62T   609G  3.03T        -         -    19%    16%
    gpt/gpzfs1_2      -      -      -        -         -      -      -
    gpt/gpzfs1_3      -      -      -        -         -      -      -
  mirror          9.06T  1.32M  9.06T        -         -     0%     0%
    gpt/gpzfs1_4      -      -      -        -         -      -      -
    gpt/gpzfs1_5      -      -      -        -         -      -      -

Everything seems good. Note that the scrub repair of 33.8G was due to me pulling the IronWolf drives from the chassis with the system live (after having removed them from the pool). This apparently caused a burp on the backplane, which was fully corrected by the scrub.

# zpool status
  pool: zfs1
 state: ONLINE
  scan: scrub repaired 33.8G in 0 days 04:43:10 with 0 errors on Sun Nov 10 01:45:59 2019
remove: Removal of vdev 2 copied 36.7G in 0h3m, completed on Thu Nov  7 21:26:09 2019
    111K memory used for removed device mappings
config:

	NAME              STATE     READ WRITE CKSUM
	zfs1              ONLINE       0     0     0
	  mirror-0        ONLINE       0     0     0
	    gpt/gpzfs1_0  ONLINE       0     0     0
	    gpt/gpzfs1_1  ONLINE       0     0     0
	  mirror-1        ONLINE       0     0     0
	    gpt/gpzfs1_2  ONLINE       0     0     0
	    gpt/gpzfs1_3  ONLINE       0     0     0
	  mirror-3        ONLINE       0     0     0
	    gpt/gpzfs1_4  ONLINE       0     0     0
	    gpt/gpzfs1_5  ONLINE       0     0     0

errors: No known data errors

Expanded zfs1 pool on kiva

I purchased two Seagate IronWolf Pro 8TB drives at MicroCenter today. They’ve been added to the zfs1 pool on kiva.


# gpart create -s gpt da5
da5 created
# gpart create -s gpt da6
da6 created

# gpart add -t freebsd-zfs -l gpzfs1_4 -b1M -s7450G da5
da5p1 added
# gpart add -t freebsd-zfs -l gpzfs1_5 -b1M -s7450G da6
da6p1 added

# zpool add zfs1 mirror /dev/gpt/gpzfs1_4 /dev/gpt/gpzfs1_5

# zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zfs1 14.5T 2.76T 11.7T - - 14% 19% 1.00x ONLINE -
mirror 3.62T 1.87T 1.75T - - 33% 51%
gpt/gpzfs1_0 - - - - - - -
gpt/gpzfs1_1 - - - - - - -
mirror 3.62T 910G 2.74T - - 24% 24%
gpt/gpzfs1_2 - - - - - - -
gpt/gpzfs1_3 - - - - - - -
mirror 7.25T 1.05M 7.25T - - 0% 0%
gpt/gpzfs1_4 - - - - - - -
gpt/gpzfs1_5 - - - - - - -

You may need 10 gigabit networking at home

Given the ignorance I’ve seen in some forums with respect to the need for 10 gigabit networking at home, I decided it was time to blog about it.

The argument against 10 gigabit networking at home for ANYONE, as I’ve seen posted in discussions on arstechnica and other sites, is ignorant. The same arguments were made when we got 100baseT, and again when we got 1000baseT at consumer pricing. And they were proven wrong, just as they will be for 10G whether it’s 10GBaseT, SFP+ DAC, SFP+ SR optics, or something else.

First, we should disassociate the WAN connection (say Comcast, Cox, Verizon, whatever) from the LAN. If you firmly believe that you don’t need LAN speeds that are higher than your WAN connection, I have to assume that you either a) do very little within your home that doesn’t use your WAN connection or b) just have no idea what the heck you’re talking about. If you’re in the first camp, you don’t need 10 gigabit LAN in your home. If you’re in the second camp, I can only encourage you to learn a bit more and use logic to determine your own needs. And stop telling others what they don’t need without listening to their unique requirements.

There are many of us with needs for 10 gigabit LAN at home. Let’s take my needs, for example, which I consider modest. I have two NAS boxes with ZFS arrays. One of these hosts some automated nightly backups, a few hundred movies (served via Plex) and some of my music collection. The second hosts additional automated nightly backups, TimeMachine instances and my source code repository (which is mirrored to the first NAS with ZFS incremental snapshots).

At the moment I have 7 machines that run automated backups over my LAN. I consider these backups critical to my sanity, and they’re well beyond what I can reasonably accomplish via a cloud storage service. With data caps and my outbound bandwidth constraints, nightly cloud backups aren’t an option. Fortunately, I am not in desperate need of offsite backups, and the truly critical stuff (like my source code repository) is mirrored in a lot of places and occasionally copied to DVD for offsite storage. I’m not sure what I’ll do the day my source code repository gets beyond what I can reasonably burn to DVD but it’ll be a long while before I get there (if ever). If I were to have a fire, I’d only need to grab my laptop on my way out the door in order to save my source code. Yes, I’d lose other things. But…

Fires are rare. I hope to never have one. Disk failures are a lot less rare. As are power supply failures, fan failures, etc. This is the main reason I use ZFS. But, at 1 gigabit/second network speeds, the network is a bottleneck for even a lowly single 7200 rpm spinning drive doing sequential reads. A typical decent single SATA SSD will fairly easily reach 4 gigabits/second. Ditto for a small array of spinning drives. NVME/M.2/multiple SSD/larger spinning drive array/etc. can easily go beyond 10 gigabits/second.

Why does this matter? When a backup kicks off and saturates a 1 gigabit/second network connection, that connection becomes a lot less usable for other things. I’d prefer the network connection not be saturated, and that the backup complete as quickly as possible. In other words, I want to be I/O bound in the storage subsystem, not bandwidth bound in the network. This becomes especially critical when I need to restore from a backup. Even if I have multiple instances of a service (which I do in some cases), there’s always one I consider ‘primary’ and want to restore as soon as possible. And if I’m restoring from backup due to a security breach (hasn’t happened in 10 years, knock on wood), I probably can’t trust any of my current instances and hence need a restore from backup RIGHT NOW, not hours later. The faster a restoration can occur (even if it’s just spinning up a VM snapshot), the sooner I can get back to doing real work.

Then there’s just the shuffling of data. Once in a while I mirror all of my movie files, just so I don’t have to re-rip a DVD or Blu-Ray. Some of those files are large, and a collection of them is very large. But I have solid state storage in all of my machines and arrays of spinning drives in my NAS machines. Should be fast to transfer the files, right? Not if your network is 1 gigabit/second… your average SATA SSD will be 75% idle while trying to push data through a 1 gigabit/second network, and NVMe/M.2/PCIe solid state will likely be more than 90% idle. In other words, wasting time. And time is money.

So, some of us (many of us if you read servethehome.com) need 10 gigabit networking at home. And it’s not ludicrously expensive anymore, and prices will continue to drop. While I got an exceptional deal on my Netgear S3300-52X-POE+ main switch ($300), I don’t consider it a major exception. New 8-port 10GbaseT switches are here for under $1000, and SFP+ switches are half that price new (say a Ubiquiti ES-16-XG which also has 4 10GbaseT ports). Or buy a Quanta LB6M for $300 and run SR optics. Right now I have a pair of Mellanox ConnectX-2 EN cards for my web server and my main NAS, which I got for $17 each. Two 3-meter DAC cables for $15 each from fs.com connect these to the S3300-52X-POE+ switch’s SFP+ ports. In my hackintosh desktop I have an Intel X540-T2 card, which is connected to one of the 10GbaseT ports on the S3300-52X-POE+ via cat6a shielded keystones and cables (yes, my patch panels are properly grounded). I will eventually change the X540-T2 to a less power-hungry card, but it works for now and it was $100. I expect to see more 10GbaseT price drops in 2018. And I hope to see more options for mixed SFP+ and 10GbaseT in switches. We’re already at the point where copper has become unwieldy, since cat6a (esp. shielded) and cat7 are thick, heavy cables. And cat8? Forget about running much of that since it’s a monster size-wise. At 10 gigabits/second, it already makes sense to run multimode fiber for EMI immunity, distance, raceway/conduit space, no code violations when co-resident with AC power feeds, etc. Beyond 10 gigabit/second, which we’ll eventually want and need, I don’t see copper as viable. Sure, copper has traditionally been easier to terminate than fiber. But in part that’s because the consumer couldn’t afford or justify the need for it and hence fiber was a non-consumer technology. Today it’s easier to terminate fiber than it’s ever been, and it gets easier all the time. And once you’re pulling a cat6a or cat8 cable, you can almost fit an OM4 fiber cable with a dual LC connector on it through the same spaces and not have to field terminate at all. That’s the issue we’re facing with copper. Much like the issues with CPU clock speeds, we’re reaching the limits of what can reasonably be run on copper over typical distances in a home (where cable routes are often far from the shortest path from A to B). In a rack, SFP+ DAC (Direct Attach Copper) cables work well. But once you leave the rack and need to go through a few walls, the future is fiber. And it’ll arrive faster than some people expect, in our homes. Just check what it takes to send 4K raw video at 60fps. Or to backhaul an 802.11ac Wave 2 WiFi access point without creating a bottleneck on the wired network. Or the time required to send that 4TB full backup to your NAS.

OK, I feel better. 🙂 I had to post about this because it’s just not true that no one needs 10 gigabit networking in their home. Some people do need it.

My time is valuable, as is yours. Make your own decisions about what makes sense for your own home network based on your own needs. If you don’t have any pain points with your existing network, keep it! Networking technology is always cheaper next year than it is today. But if you can identify pain that’s caused by bandwidth constraints on your 1 gigabit network, and the pain warrants an upgrade to 10 gigabit (even if only between 2 machines), by all means go for it! I don’t know anyone that’s ever regretted a network upgrade that was well considered.

Note that this post came about partly due to some utter silliness I’ve seen posted online, including egregiously incorrect arithmetic. One of my favorites was from a poster on arstechnica who repeatedly (as in dozens of times) claimed that no one needed 10 gigabit ethernet at home because he could copy a 10 TB NAS to another NAS in 4 hours over a 1 gigabit connection. So be careful what you read on the Internet, especially if it involves numbers… it might be coming from someone with faulty arithmetic that certainly hasn’t ever actually copied 10 terabytes of data over a 1 gigabit network in 4 hours (hint… it would take almost 24 hours if it has the network all to itself, longer if there is other traffic on the link).

I’d be remiss if I didn’t mention other uses for 10 gigabit ethernet. Does your household do a lot of gaming via Steam? You’d probably benefit from having a local Steam cache with 10 gigabit connectivity to the gaming machines. Are you running a bunch of Windows 10 instances? You can pull down updates to one machine and distribute them from there to all of your Windows 10 instances, and the faster, the better. Pretty much every scenario where you need to move large pieces of data over the network will benefit from 10 gigabit ethernet. You have to decide for yourself if the cost is justified. In my case, I’ve installed the bare minimum (4 ports of 10 gigabit) that alleviates my existing pain points. At some point in the future I’ll need more 10 gigabit ports, and as long as it’s not in the next few months, it’ll be less expensive than it is today. But if you could use it today, take a look at your inexpensive options. Mellanox ConnectX-2 EN cards are inexpensive on eBay, and even the newer cards aren’t ludicrously expensive. If you only need 3 meters or less of distance, look at using SFP+ DAC cables. If you need more distance, look at using SR optical transceivers in Mellanox cards or Intel X540-DA2 (or newer) and fiber, or 10GbaseT (Intel X540-T2 or X540-T1 or newer, or a motherboard with on-board 10GbaseT). You have relatively inexpensive switch options if you’re willing to buy used on eBay and only need a few ports at 10 gigabit, or you’re a techie willing to learn to use a Quanta LB6M and can put it somewhere where it won’t drive you crazy (it’s loud).

mcperf: a multithreaded bandwidth tester

I’ve been really dismayed by the lack of decent simple tools for testing the available bandwidth between a pair of hosts above 1 gigabit/second. Back when I didn’t have any 10 gigabit connections at home, I used iperf and iperf3. But I now have several 10 gigabit connections on my home network, and since these tools don’t use multithreading effectively, they become CPU bound (on a single core) before they reach the target bandwidth. Tools like ssh and scp have the same problem; they’re single threaded and become CPU bound long before they saturate a 10 gigabit connection.

When I install a 10 gigabit connection, whether it’s via SFP+ DACs, SFP+ SR optics or 10GbaseT, it’s important that I’m able to test the connection’s ability to sustain somewhere near line rate transfers end-to-end. Especially when I’m buying my DACs, transceivers or shielded cat6a patch cables from eBay or any truly inexpensive vendor. I needed a tool that could saturate a 10 gigabit connection and report the data transfer rate at the application level. Obviously due to the additional data for protocol headers and link encapsulation, this number will be lower than the link-level bandwidth, but it’s the number that ultimately matters for an application.

So, I quickly hacked together a multithreaded application to test my connections at home. It will spawn the requested number of threads (on each end) and the server will send data from each thread. Each thread gets its own TCP connection.

For a quick hack, it works well.


dwm@www:/home/dwm% mcperf -t 4 -c kiva
bandwidth: 8.531 Gbits/sec
bandwidth: 8.922 Gbits/sec
bandwidth: 9.069 Gbits/sec
bandwidth: 9.148 Gbits/sec
bandwidth: 9.197 Gbits/sec
bandwidth: 9.230 Gbits/sec
bandwidth: 9.253 Gbits/sec
bandwidth: 9.269 Gbits/sec
bandwidth: 9.283 Gbits/sec

Given that I don’t create servers that don’t use strong authentication, even if they’ll only be run for 10 seconds, I’m using the PeerAuthenticator from libDwmAuth for authentication. No encryption of the data that’s being sent, since it’s not necessary.

Of course this got me thinking about the number of tools we have today that just don’t cut it in a 10 gigabit network. ssh, scp, ftp, fetch, etc. Even NFS code has trouble saturating a 10 gigabit connection. It seems like eons ago that Herb Sutter wrote “The Free Lunch Is Over”. It was published in 2005. Yet we still have a bunch of tools that are CPU bound due to being single-threaded. How are we supposed to take full advantage of 10 gigabit and faster networks if the tools we use for file transfer, streaming, etc. are single-threaded and hence CPU bound well before they reach 10 gigabits/second? What happens when I run some fiber at home for NAS and want to run 40 gigabit or (egads!) 100 gigabit? It’s not as if I don’t have the CPU to do 40 gigabits/second; my NAS has 12 cores and 24 threads. But if an application is single-threaded, it becomes CPU bound at around 3.5 gigabits/second on a typical server CPU core. 🙁 Sure, that’s better than 1 gigabit/second but it’s less than what a single SATA SSD can do, and much less than what an NVME/M.2/striped SATA SSD/et. al. can do.

We need tools that aren’t written as if it’s 1999. I suspect that after I polish up mcperf a little bit, I’m going to work on my own replacement for scp so I can at least transfer files without being CPU bound at well below my network bandwidth.

short flurry of ssh login attempts blocked by mcblockd

mcblockd added quite a few networks during a 20 minute period today. I don’t have an explanation for the ssh login attempts all coming in during this period, but it’s nice to see that mcblockd happily blocked all of them.

While this is by no means a high rate of attempts, it’s higher than what I normally see.

May 22 11:32:10 ria mcblockd: [I] Added 185.129.60/22 (DK) to ssh_losers for 180 days
May 22 11:32:11 ria mcblockd: [I] Added 89.234.152/21 (FR) to ssh_losers for 180 days
May 22 11:32:45 ria mcblockd: [I] Added 46.233.0/18 (BG) to ssh_losers for 180 days
May 22 11:33:00 ria mcblockd: [I] Added 216.218.222/24 (US) to ssh_losers for 30 days
May 22 11:33:05 ria mcblockd: [I] Added 199.87.154/24 (CA) to ssh_losers for 30 days
May 22 11:33:15 ria mcblockd: [I] Added 78.109.16/20 (UA) to ssh_losers for 180 days
May 22 11:33:18 ria mcblockd: [I] Added 89.38.148/22 (FR) to ssh_losers for 180 days
May 22 11:33:26 ria mcblockd: [I] Added 65.19.167/24 (US) to ssh_losers for 30 days
May 22 11:34:05 ria mcblockd: [I] Added 62.212.64/19 (NL) to ssh_losers for 180 days
May 22 11:35:54 ria mcblockd: [I] Added 190.10.0/17 (CR) to ssh_losers for 180 days
May 22 11:37:16 ria mcblockd: [I] Added 192.42.116/22 (NL) to ssh_losers for 180 days
May 22 11:38:33 ria mcblockd: [I] Added 199.249.223/24 (US) to ssh_losers for 30 days
May 22 11:38:37 ria mcblockd: [I] Added 173.254.216/24 (US) to ssh_losers for 30 days
May 22 11:39:48 ria mcblockd: [I] Added 128.52.128/24 (US) to ssh_losers for 30 days
May 22 11:39:51 ria mcblockd: [I] Added 64.113.32/24 (US) to ssh_losers for 30 days
May 22 11:40:32 ria mcblockd: [I] Added 23.92.27/24 (US) to ssh_losers for 30 days
May 22 11:40:50 ria mcblockd: [I] Added 162.221.202/24 (CA) to ssh_losers for 30 days
May 22 11:42:42 ria mcblockd: [I] Added 91.213.8/24 (UA) to ssh_losers for 180 days
May 22 11:43:37 ria mcblockd: [I] Added 162.247.72/24 (US) to ssh_losers for 30 days
May 22 11:44:34 ria mcblockd: [I] Added 193.110.157/24 (NL) to ssh_losers for 180 days
May 22 11:44:38 ria mcblockd: [I] Added 128.127.104/23 (SE) to ssh_losers for 180 days
May 22 11:45:50 ria mcblockd: [I] Added 179.43.128/18 (CH) to ssh_losers for 180 days
May 22 11:45:55 ria mcblockd: [I] Added 89.144.0/18 (DE) to ssh_losers for 180 days
May 22 11:46:29 ria mcblockd: [I] Added 197.231.220/22 (LR) to ssh_losers for 180 days
May 22 11:46:44 ria mcblockd: [I] Added 195.254.132/22 (RO) to ssh_losers for 180 days
May 22 11:46:54 ria mcblockd: [I] Added 154.16.244/24 (US) to ssh_losers for 30 days
May 22 11:47:52 ria mcblockd: [I] Added 87.118.64/18 (DE) to ssh_losers for 180 days
May 22 11:48:51 ria mcblockd: [I] Added 46.165.224/19 (DE) to ssh_losers for 180 days
May 22 11:50:13 ria mcblockd: [I] Added 178.17.168/21 (MD) to ssh_losers for 180 days
May 22 11:50:47 ria mcblockd: [I] Added 31.41.216/21 (UA) to ssh_losers for 180 days
May 22 11:50:55 ria mcblockd: [I] Added 62.102.144/21 (SE) to ssh_losers for 180 days
May 22 11:51:19 ria mcblockd: [I] Added 64.137.244/24 (CA) to ssh_losers for 30 days
May 22 11:52:28 ria mcblockd: [I] Added 80.244.80/20 (SE) to ssh_losers for 180 days
May 22 11:52:42 ria mcblockd: [I] Added 192.160.102/24 (CA) to ssh_losers for 30 days
May 22 11:53:06 ria mcblockd: [I] Added 176.10.96/19 (CH) to ssh_losers for 180 days
May 22 11:55:38 ria mcblockd: [I] Added 77.248/14 (NL) to ssh_losers for 180 days
May 22 11:56:20 ria mcblockd: [I] Added 199.119.112/24 (US) to ssh_losers for 30 days
May 22 11:56:32 ria mcblockd: [I] Added 94.142.240/21 (NL) to ssh_losers for 180 days