Modernizing mcpigdo

My custom garage door opener appliance is running FreeBSD 11.0-ALPHA5 on a Raspberry Pi 2B. It has worked fine for about 8 years now. However, I want to migrate it to FreeBSD 13.2-STABLE and from libDwmAuth to libDwmCredence. And just bring the code current.

The tricky part is that I never quite finished packaging up my device driver for the rotary encoders, and it was somewhat experimental (hence the alpha release of FreeBSD). But as of today it appears I have the rotary encoder device drivers working fine on FreeBSD 13.2-STABLE on a Raspberry Pi 4B. The unit tests for libDwmPi are passing, and I’m adding to them and doing a little cleanup so I’ll be able to maintain it longer-term.

I should note that the reason I went with FreeBSD at the time was pretty simple: the kernel infrastructure for what I needed to do was significantly better versus linux. That may or may not be true today, but for the moment I have no need to look at doing this on linux. The only non-portable code here is my device driver, and it’s relatively tiny (including boilerplate stuff).

Looking back at this project, I should have made a few more hardware-wise. The Raspberry Pi 2B is more than powerful enough for the job, and given that I put it inside a sealed enclosure, the lower power consumption versus a 4B is nice. I’m pretty sure my mom would appreciate one of these, if just by virtue of being able to open her garage doors with her phone or watch. The hardware (the Pi and the HAT I created) has been flawless, and I’ve had literally zero issues despite it being in a garage with no climate control (so it’s seen plenty of -10F and 95F days). It just works.

However, today I could likely do this in a smaller enclosure, thanks to PoE HATs. Unfortunately not the official latest Raspberry Pi PoE HAT because its efficiency is abysmal (generates too much heat). If I bump the Pi to a 4B, I’ll probably stick with a separate PoE splitter (fanless). I’ll need a new one since the power connector has changed.

The arguments for moving to a Pi 4B:

  • future-proofing. If I want to build another one, I’m steered toward the Pi 4B simply because it’s what I can buy and what’s current.
  • faster networking (1G versus 100M)
  • more oomph for compiling C and C++ code locally
  • Some day, the Pi 2B is going to stop working. I’ve no idea when that day might be, but 8 years in Michigan weather seems like it has probably taken a significant toll. On the other hand it could last another 20 years. There are no electrolytic capacitors, I’m using it headless, and none of the USB ports are in use.

The arguments against it:

  • higher power consumption, hence more heat
  • the Pi 2B isn’t dead yet

I think it’s pretty clear that during this process, I should try a Pi 4B. The day will come when I’ll have to abandon the 2B, and I’d rather do it on my timeline. No harm in keeping the 2B in a box while I try a 4B. Other than the PoE splitter, it should be a simple swap. Toward that end, I ordered a 4B with 4G of RAM (I don’t need 8G of RAM here). I still need to order a PoE splitter, but I can probably scavenge an original V2 PoE HAT from one of my other Pis and stack with stacking headers.

Over the weekend I started building FreeBSD 13.2-STABLE (buildworld) on the Pi 2B and as usual hit the limits. The problem is that 1G of RAM isn’t sufficient to utilize the 4 cores. It’s terribly slow even when you can use all 4 cores, but if you start swapping to a microSD card… it takes days for 'make buildworld‘ to finish. And since I have a device driver I’m maintaining for this device, it’s expected that I’ll need to rebuild the kernel somewhat regularly and also build the world occasionally. This is the main motivation for bumping to a Raspberry Pi 4B with 4G of RAM. It is possible it’ll still occasionally start swapping with a ‘make -j4 buildworld‘ , but the cores are faster and I don’t frequently see a single instance of the compiler or llvm-tblgen go over 500M, but it does happen. I think 4G is sufficient to avoid swapping during a full build.

Update Aug 26, 2023: duh, a while after I first created mcpigdo, it became possible to do what I need to do with the rotary encoders from user space. With FreeBSD 13.2, I can configure interrupts on the GPIO pins and be notified via a number of means. I’m going to work on changing my code to not need my device driver. This is good news since I’ve had some problems with my very old device driver despite refactoring, and I don’t have time to keep maintaining it. Moving my code to user space will make it more portable going forward, though it’ll still be FreeBSD-only. It will also allow for more flexibility.

Striping 4 Samsung 990 Pro 2TB on Ubuntu 22.04

On Prime Day I ordered four Samsung 990 Pro 2TB NVMe SSDs to install in my Threadripper machine. I’ve had an unopened Asus Hyper M.2 x16 Gen4 card for years waiting for drives. Just never got around to finishing the plan for my Threadripper machine.

The initial impression is positive. Just for fun, I striped all 4 of them and put an ext4 filesystem on the group, just to grab some out-of-the-box numbers. First up: a simple read test, which yielded more than 24 gigabytes/second. Nice.

dwm@thrip:/hyperx/dwm% fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting

TEST: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.28
Starting 1 process
TEST: Laying out IO file (1 file / 2048MiB)

TEST: (groupid=0, jobs=1): err= 0: pid=6333: Wed Jul 19 02:11:19 2023
  read: IOPS=25.2k, BW=24.6GiB/s (26.4GB/s)(10.0GiB/407msec)
    slat (usec): min=27, max=456, avg=38.46, stdev=21.03
    clat (usec): min=174, max=10736, avg=1206.67, stdev=443.19
     lat (usec): min=207, max=11193, avg=1245.21, stdev=460.96
    clat percentiles (usec):
     |  1.00th=[  971],  5.00th=[ 1020], 10.00th=[ 1037], 20.00th=[ 1057],
     | 30.00th=[ 1074], 40.00th=[ 1074], 50.00th=[ 1074], 60.00th=[ 1090],
     | 70.00th=[ 1123], 80.00th=[ 1172], 90.00th=[ 1975], 95.00th=[ 2024],
     | 99.00th=[ 2245], 99.50th=[ 2278], 99.90th=[ 7832], 99.95th=[ 9241],
     | 99.99th=[10421]
  lat (usec)   : 250=0.05%, 500=0.24%, 750=0.26%, 1000=1.76%
  lat (msec)   : 2=88.58%, 4=8.87%, 10=0.21%, 20=0.03%
  cpu          : usr=2.71%, sys=96.06%, ctx=144, majf=0, minf=8205
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=24.6GiB/s (26.4GB/s), 24.6GiB/s-24.6GiB/s (26.4GB/s-26.4GB/s), io=10.0GiB (10.7GB), run=407-407msec

Disk stats (read/write):
    dm-0: ios=151773/272, merge=0/0, ticks=47264/0, in_queue=47264, util=83.33%, aggrios=10240/21, aggrmerge=30720/63, aggrticks=3121/2, aggrin_queue=3124, aggrutil=76.09%
  nvme3n1: ios=10240/21, merge=30720/63, ticks=3146/3, in_queue=3149, util=76.09%
  nvme4n1: ios=10240/21, merge=30720/63, ticks=3653/3, in_queue=3657, util=76.09%
  nvme1n1: ios=10240/21, merge=30720/63, ticks=2504/3, in_queue=2507, util=76.09%
  nvme2n1: ios=10240/21, merge=30720/63, ticks=3182/2, in_queue=3184, util=76.09%

A short while later, I ran a simple write test. Here I see more than 13 gigabytes/second.

dwm@thrip:/hyperx/dwm% fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.28
Starting 1 process

TEST: (groupid=0, jobs=1): err= 0: pid=6682: Wed Jul 19 02:15:31 2023
  write: IOPS=13.7k, BW=13.4GiB/s (14.4GB/s)(10.0GiB/746msec); 0 zone resets
    slat (usec): min=35, max=297, avg=69.19, stdev=14.38
    clat (usec): min=48, max=9779, avg=2242.89, stdev=738.00
     lat (usec): min=105, max=9837, avg=2312.18, stdev=740.08
    clat percentiles (usec):
     |  1.00th=[ 1549],  5.00th=[ 2040], 10.00th=[ 2057], 20.00th=[ 2073],
     | 30.00th=[ 2089], 40.00th=[ 2089], 50.00th=[ 2114], 60.00th=[ 2114],
     | 70.00th=[ 2114], 80.00th=[ 2147], 90.00th=[ 2278], 95.00th=[ 3195],
     | 99.00th=[ 6456], 99.50th=[ 8979], 99.90th=[ 9503], 99.95th=[ 9634],
     | 99.99th=[ 9765]
   bw (  MiB/s): min=13578, max=13578, per=98.92%, avg=13578.00, stdev= 0.00, samples=1
   iops        : min=13578, max=13578, avg=13578.00, stdev= 0.00, samples=1
  lat (usec)   : 50=0.01%, 100=0.02%, 250=0.09%, 500=0.15%, 750=0.16%
  lat (usec)   : 1000=0.19%
  lat (msec)   : 2=1.06%, 4=96.87%, 10=1.46%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=180, max=180, avg=180.00, stdev= 0.00
    sync percentiles (nsec):
     |  1.00th=[  181],  5.00th=[  181], 10.00th=[  181], 20.00th=[  181],
     | 30.00th=[  181], 40.00th=[  181], 50.00th=[  181], 60.00th=[  181],
     | 70.00th=[  181], 80.00th=[  181], 90.00th=[  181], 95.00th=[  181],
     | 99.00th=[  181], 99.50th=[  181], 99.90th=[  181], 99.95th=[  181],
     | 99.99th=[  181]
  cpu          : usr=36.11%, sys=53.56%, ctx=9861, majf=0, minf=14
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,10240,0,1 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=13.4GiB/s (14.4GB/s), 13.4GiB/s-13.4GiB/s (14.4GB/s-14.4GB/s), io=10.0GiB (10.7GB), run=746-746msec

Disk stats (read/write):
    dm-0: ios=0/135825, merge=0/0, ticks=0/12864, in_queue=12864, util=87.00%, aggrios=0/10276, aggrmerge=0/30823, aggrticks=0/1124, aggrin_queue=1125, aggrutil=82.63%
  nvme3n1: ios=0/10275, merge=0/30822, ticks=0/1109, in_queue=1109, util=82.20%
  nvme4n1: ios=0/10276, merge=0/30826, ticks=0/1000, in_queue=1001, util=82.63%
  nvme1n1: ios=0/10276, merge=0/30822, ticks=0/1366, in_queue=1367, util=82.63%
  nvme2n1: ios=0/10277, merge=0/30825, ticks=0/1022, in_queue=1023, util=82.20%

It’s worth noting that I don’t consider this configuration a good idea for anything other than scratch space (perhaps for DL training data sets, etc.); 4 striped drives is as my friend Ben put it, risky. I of course trust SSD more than spinning rust here, and historically I’ve had no failures with Samsung SSD drives, but… that’s a hard thing to judge from just my personal observations and from where the industry has gone. I still have Samsung SATA SSDs from the 830 and 840 series, and they’re still healthy. But… we’ve gone from SLC to TLC to QLC to… losing a hair of reliability (and a chunk of warranty) at each step. And I’d be remiss if I didn’t mention Samsung’s botched firmware in the last two generations (980 and 990). In fact I’m annoyed that 2 of the 4 drives I received have old firmware that I’ll need to update.

Raspberry Pi PoE+ is a typo (should be PoS)

Seriously. ‘S’ is adjacent ‘E’ on a QWERTY keyboard.

I knew the official PoE+ HATs were pieces of poop before I bought them. This isn’t news. You don’t have to look hard to find Jeff Geerling’s comments, Martin Rowan’s comments or the many others who’ve complained. I had read them literally years before I purchased.

I decided to buy 4 of them despite the problems, for a specific purpose (4 rack mounted Pi4B 8G, all powered via PoE alone). I’ve had those running for a few days and they’re working. They’re inefficient, but so far they work.

I also ordered 2 more, with the intent of using one of them on a Raspberry Pi 4B 8G in an Anidees Pro case and keeping the other as a spare. Well, in literally 36 hours, one of them is already dead. I believe it destroyed itself via heat. And therein lies part of the problem. I’ll explain what I casually observed, since I wasn’t taking measurements.

I ran the Pi from USB-C for about a day, without the PoE HAT installed. It was in the Anidees Pro case, fully assembled. It was fine, idling around 37.4C and not seeming to go above 44C when running high loads (make -j4 on some medium-sized C++ projects; a prominent workload for me). Solid proof that for my use, the Anidees Pro case works exactly as intended. The case is a big heatsink. Note that I have the 5mm spacers installed for the lid, so it’s open around the entire top perimeter.

I then installed the PoE+ HAT, with extension headers and the correct length standoffs that are needed in the Anidees Pro case. Note that this activity isn’t trivial; the standoffs occupy the same screw holes as the bottom of the case (from opposite directions), and an unmodified standoff is likely to bottom out as it collides with the end of the case bottom screw. You can shorten the threaded end of the standoff, or do as I did and use shorter standoffs and add a nut and washers to take up some of the thread. I don’t advise shortening the screws for the bottom of the case.

I plugged in the PoE ethernet from my office lab 8-port PoE switch, which has been powering the 4 racked Pis for a few days. And observed the expected horrible noise noted by others. Since I expected it, I immediately unplugged the USB-C power. I continued installing software and started compiling and installing my own software (libDwm, libCredence, mcweather, DwmDns, libDwmWebUtils, mcloc, mcrover, etc.). It was late, so I stopped here. On my way out of the home office, I put my hand on the Pi. It was much warmer than when running from the USB-C. In fact, uncomfortably warm. I checked the CPU temperature with vcgencmd, it was under 40C. Hmm. I was tired, so I decided to leave it until the next day and see what happens.

In the morning the Pi had no power. I unplugged and plugged both ends of the 12″ PoE cable. Nothing.

It turns out that the PoE+ HAT is dead. Less than 48 hours of runtime. As near as I can tell, it cooked itself. The PoE port on the ethernet switch still works great. The Pi still works great (powered from USB-C after the dead PoE+ HAT was removed).

I find this saddening and unacceptable. “If it’s not tested, it’s broken.”. Hey Eben: it’s broken. No, literally, it’s broken. And looks to not even be smoke tested. In fact I’d say it’s worse than the issues with Rev. 1 of the original PoE HAT. This is a design problem, not a testing problem. In other words, the problem occurred at the very beginning of the process. Which means it passed through all of engineering. And this issue lies with leadership, not the engineers.

So not only have you gone backward, you’ve gone further back than you were for Rev. 1 of the PoE HAT. And you discontinued the only good PoE HAT you had? Now I’n just left with, “Don’t believe anything, ANYTHING from the mouth of Eben Upton.”.

I’m angry because my trust of Raspberry Pi has been eroding for years, and this is just another lump of coal. We hobbyists were basically screwed for 3 years on availability of all things Pi 4, and you’re still selling a PoE HAT that no one should use.

I’ve been saying this for a couple of years now: there is opportunity for disruption here. While I appreciate the things the Raspberry Pi Foundation has done, I’m starting to feel like I can’t tell anyone that the Pi hardware landscape is great. In fact for many things, it has stagnated.

For anyone for whom 20 US dollars matters, do NOT buy an official PoE+ HAT. Not that it matters… it’s June 2023 and it’s still not trivial to find something to use it (a Raspberry Pi 3 or 4).

There comes a time when a platform dies. They get lazy after the ecosystem builds around them. I’m wondering if I am seeing that on the horizon for the Raspberry Pi.

More Raspberry Pi? Thanks!

Years ago I bought a fairly simple 1U rack mount for four Raspberry Pi Model 4B computers. Then the COVID-19 pandemic happened, and for years it wasn’t possible to find Model 4B’s with 4G or 8G of RAM at less than scandalous scalper prices. So the inexpensive rack mount sat for years collecting dust.

This month, June 2023, I was finally able to buy four Model 4B Raspberry Pis with 8G of RAM, at retail price ($75 each). Hallelujah.

I also bought four of the PoE+ HATs. Which IMHO suck compared to the v2 version of the original PoE HATs; the efficiency is terrible at the tiny loads I have on them (no peripherals), they consume a lot more power and waste it as heat. I don’t need to repeat what’s been written elsewhere by those who’ve published measurements. There also appears to be a PoE to USB-C isolation issue, but fortunately for me I won’t have anything plugged into the USB-C on these Pis.

The plan is to put these four Pis in the wall-mounted switch rack in the basement. They’re mostly going to provide physical redundancy for services I run that don’t require much CPU or network and storage bandwidth. DNS, DHCP, mcrover and mcweather, for example.

I am using Samsung Pro Endurance 128G microSD cards for longevity. If I needed more and faster I/O, I’d be using a rack with space for M.2 SATA per Pi, but I don’t need it for these.

I’ve loaded the latest Raspberry Pi OS Lite 64-bit on them, configured DHCP and DNS for them (later I’ll configure static IPs on them), and started installing the things I know I want/need. They all have their PoE+ HATs on, and are installed in the rack mount. I’ll put the mount into the rack this weekend. The Pis are named grover, nomnom, snoopy and lassie.

Separately, I ordered 2 more Raspberry Pis (same model 4B with 8G of RAM), two more PoE+ HATs and 2 cases: an Argon ONE v2 and an Anidees AI-PI4-SG-PRO. Both of these turn a significant part of the case into a heatsink.

The Argon ONE v2 comes with a fan and can’t use the PoE+ HAT, but can accept an M2 SATA add-on. I’m planning to play with using this one in the master bedroom, connected to the TV. It’s nice that it routes everything to the rear of the case; it’s much easier to use in an entertainment center context.

I believe the Anidees AI-PI4-SG-PRO will allow me to use a PoE+ HAT, but I’ll need extension headers which I’ll order soon. I’ve liked my other Anidees cases, and I think this latest one should be the best I’ve had from them. They’re pricey but premium.

It’s nice that I can finally do some of the work I planned years ago. Despite hoping that I’d see RISC-V equivalents by now, the reality is that the Pi has a much larger ecosystem than anything equivalent. It’s still the go-to for several things and I’m happy.

mcblockd 5 years on

mcblockd, the firewall automation I created 5 years ago, continues to work.

However, it’s interesting to note how things have changed. Looking at just the addresses I block from accessing port 22…

While China remains at the top of my list of total number of blocked IP addresses, the US is now in 2nd place. In 2017, the US wasn’t even in the top 20. What has changed?

Most of the change here is driven by my automation seeing more and more attacks originating from cloud hosted services. Amazon EC2, Google, Microsoft, DigitalOcean, Linode, Oracle, et. al. While my automation policy won’t go wider than a /24 for a probe from a known US entity, over time I see probes from entire swaths of contiguous /24 networks from the same address space allocation, which will be coalesced to reduce firewall table size. Two adjacent /24 networks become a single /23. Two adjacent /23 networks become a single /22. All the way up to a possible /8 (the automation stops there).

So today, the last of 2022, I see some very large blocks owned by our cloud providers being blocked by my automation due to receiving ssh probes from large contiguous swaths of their address space.

I am very appreciative of the good things from big tech. But I’m starting to see the current cloud computing companies as the arms dealers of cyberspace.

My top 2 countries:

    CN 131,560,960 addresses
       /9 networks:    1 (8,388,608 addresses)
      /10 networks:   10 (41,943,040 addresses)
      /11 networks:   12 (25,165,824 addresses)
      /12 networks:   18 (18,874,368 addresses)
      /13 networks:   29 (15,204,352 addresses)
      /14 networks:   48 (12,582,912 addresses)
      /15 networks:   48 (6,291,456 addresses)
      /16 networks:   37 (2,424,832 addresses)
      /17 networks:   14 (458,752 addresses)
      /18 networks:    7 (114,688 addresses)
      /19 networks:   10 (81,920 addresses)
      /20 networks:    5 (20,480 addresses)
      /21 networks:    3 (6,144 addresses)
      /22 networks:    3 (3,072 addresses)
      /23 networks:    1 (512 addresses)

    US 92,199,996 addresses
       /9 networks:    3 (25,165,824 addresses)
      /10 networks:    5 (20,971,520 addresses)
      /11 networks:   10 (20,971,520 addresses)
      /12 networks:    9 (9,437,184 addresses)
      /13 networks:   16 (8,388,608 addresses)
      /14 networks:   10 (2,621,440 addresses)
      /15 networks:    8 (1,048,576 addresses)
      /16 networks:   42 (2,752,512 addresses)
      /17 networks:   10 (327,680 addresses)
      /18 networks:   11 (180,224 addresses)
      /19 networks:    8 (65,536 addresses)
      /20 networks:   10 (40,960 addresses)
      /21 networks:    2 (4,096 addresses)
      /22 networks:    9 (9,216 addresses)
      /23 networks:    9 (4,608 addresses)
      /24 networks:  818 (209,408 addresses)
      /25 networks:    4 (512 addresses)
      /26 networks:    5 (320 addresses)
      /27 networks:    5 (160 addresses)
      /28 networks:    2 (32 addresses)
      /29 networks:    7 (56 addresses)
      /30 networks:    1 (4 addresses)

You can clearly see the effect of my automation policy for the US. Lots of /24 networks get added, most of them with a 30 to 35 day expiration. Note that expirations increase for repeat offenses. But over time, as contiguous /24 networks are added due to sending probes at my firewall, aggregation will lead to wider net masks (shorter prefix lengths). Since I’m sorting countries based on the total number of addresses I’m blocking, obviously shorter prefixes have a much more profound effect than long prefixes.

TREBLEET Super Thunderbolt 3 Dock: First Impressions

TREBLEET Super Thunderbolt 3 Dock at Amazon

https://www.trebleet.com/product-page/mac-mini-thunderbolt-3-dock-with-nvme-sata-slot-cfexpress-card-slot-gray

I received this on August 25, 2022. I immediately installed a Samsung 980 Pro 1TB NVMe, then plugged the dock into AC power via the included power supply brick and into the Mac Studio M1 Ultra via the included Thunderbolt 3 cable. The performance to/from the Samsung 980 Pro 1TB NVMe is what I had hoped.

This is more than 3X faster than any other dock in this form factor available today. Sure, it’s not PCIe 4.0 NVMe speeds, but given that all other docks available in this form factor max out at 770 MB/s, and that Thunderbolt 3/4 is 5 GB/s, this is great.

I also checked some of the data in system report. All looks OK.

My first impression: this is the only dock to buy if you want NVMe in this form factor. Nothing else comes close speed-wise. Yes it’s pricey. Yes, it’s not a big brand name in North America. But they did the right thing with PCIe lane allocation, which hasn’t happened with OWC, Satechi or anyone else.

There’s really no point in buying a dock with NVMe if it won’t ever be able to run much faster than a good SATA SSD (I hope OWC, Satechi, Hagibis, AGPTEK, Qwizlab and others are paying attention). Buy this dock if you need NVMe storage. I can’t speak to longevity yet, but my initial rating: 5 out of 5 stars.

Mac Studio M1 Ultra: Thanks, Apple!

I finally have a computer I LIKE to place on my desk. I’m speaking of the Mac Studio M1 Ultra.

Apple finally created a desktop that mostly fits my needs. My only wishlist item: upgradeable internal storage (DIY or at Apple Store, I don’t care).

This was partly coincidence. The Mac Studio with M1 Ultra ticks the boxes I care about for my primary desktop. Faster than my Threadripper 3960X for compiling my C++ projects while small, aesthetically pleasing, quiet and cool. 10G ethernet? Check. 128G RAM? Check. Enough CPU cores for my work? Check. Fast internal storage? Check. Low power consumption? Check.

I’m serious: thanks, Apple!

This machine won’t be for everyone. News flash: no machine is for everyone. But for my current and foreseeable primary desktop needs, it’s great. And it’ll remain that way as long as we still have accessories for Thunderbolt available that are designed for the Mac Mini or Mac Studio. This isn’t a substitute for the Mac Pro; I can’t put PCIe cards in it, nor 1.5TB of RAM (or any beyond the 128G that came with mine). It’s also way more than a current Mac Mini. But that’s the point: it fills a spot that was empty in Apple’s lineup for a decade, which happens to be the sweet spot for people like me. Time is money, but I don’t need GPUs. I don’t need 1.5TB of RAM. I don’t need 100G ethernet (though I do need 10G ethernet). I’m not a video editor nor photographer; my ideal display is 21:9 at around 38 inches, for productivity (many Terminal windows), not for media. Hence the Studio Display and the Pro XDR are not good fits for me. But the Mac Studio M1 Ultra does what I need, really well.

Some people at Apple did their homework. Some championed what was done. Some did some really fine work putting it all together, from design to manufacturing. Some probably argued that it was a stopgap until the Apple silicon Mac Pro, and that’s true.

That last part doesn’t make it temporary product. Apple, please please please keep this tier alive. There are many of us out here that can’t work effectively with a Mac Mini, iMac or MacBook Pro but find it impossible to cost-justify a Mac Pro. And post-COVID there are many of us with multiple offices, one of which is at home. At home I don’t need a Mac Pro, nor do I really want one in my living space. I need just enough oomph to do real work efficiently, but don’t want a tower on my desk or the floor or even a rack-mounted machine (my home office racks are full).

I don’t care what machine occupies this space. But I’ll buy in this space, again and again, whereas I don’t see myself ever buying a Mac Pro for home with the current pricing structure.

Mac Studio M1 Ultra: The First Drive

Given that my new Mac Studio M1 Ultra is an ‘open box’ unit, I needed to fire it up and make sure that it works properly. One of the things I needed to check: that it works fine with my Dell U3818DW via USB-C for display. I have seen many reports of problems with ultra wide displays and M1 Macs, and I do not have a new display on my shopping list.

So on Sunday I left my hackintosh plugged in to the DisplayPort on the U3818DW, and plugged the Mac Studio into the USB-C port. It looks to me like it works just fine. I get native resolution, 3840×1600, with no fuss.

I am using a new Apple Magic Trackpad 2, and an old WASD CODE keyboard just to set things up. I don’t really need the new trackpad, since eventually I’ll decommission my hackintosh and take the trackpad from there. But I need one during the transition, and it was on sale at B&H.

With just a 30 minute spin… wow. I honestly can’t believe how zippy this machine is, right out of the box. Therein lies the beauty of using the same desktop computer for 10 years; when you finally upgrade, the odds are very good that you’re going to notice a significant improvement. In some cases, some of it will just be “less accumulated cruft launched at startup and login”. But in 10 years, the hardware is going to be much faster.

Compiling libDwm on the Mac Studio M1 Ultra with ‘make -j20' takes 32 seconds. Compiling it on my Threadripper 3960X machine with 256G of RAM with ‘make -j24‘ takes 40 seconds. You read that correctly… the M1 Ultra soundly beats my Threadripper 3960X for my most common ‘oomph’ activity (compiling C++ code), despite having a slower base clock and only having 16 performance cores and 4 efficiency cores. While using a fraction of the electricity. Bravo!

“Moore’s Law is dead.”. In the strictest sense, just on transistor density, this is mostly true. Process shrink has slowed down, etc. But the rules changed for many computing domains long before we were talking about TSMC 5nm. See Herb Sutter’s “The Free Lunch is Over“. Dies have grown (more transistors), core counts have grown, clock speed has increased but very slowly when compared to the olden days. Cache is, well, something you really need to look at when buying a CPU for a given workload.

This last point is something I haven’t had time to research, in terms of analysis. If you need performant software on a general purpose computer, cache friendliness is likely to matter. Up until recently, reaching out to RAM versus on-chip or on-die cache came with a severe penalty. That of course remains true on our common platforms (including Apple silicon). However, Apple put the RAM in the SoC. For the M1 Ultra, the bandwidth is 800 GB/sec. DDR4 3200 is 25.6GB/sec if you have 8 channels. DDR5 4800 with 8 channels is 76.8GB/sec. Let that sink in for a moment… the memory bandwidth of the M1 Ultra is more than a decimal order of magnitude higher than what we see in Intel and AMD machines. My question: how significant has this been for the benchmarks and real work loads? If significant, does this mean we’re going to see the industry follow Apple here? AMD and Intel releasing SoCs with CPU and RAM?

I know there are tinkerers that bemoan this future. But we bemoan the loss of many things in computing. I’m going to remain optimistic. Do I personally really care if today’s CPU + RAM purchase turns into an SoC purchase? To be honest, not really. But that’s just me; computing needs are very diverse. Those of us who tinker, well, we might just wind up tinkering with fewer parts. I don’t see the whole PC industry reversing any time soon in a manner that creates a walled garden any more than what we have today. It’s not like the current industry hasn’t been good for Intel and AMD. Yes, computing needs have diversified and we’ve put ‘enough’ power into smaller devices to meet the needs of many more consumers. And Intel and AMD have largely been absent in mobile. But they’ve maintained a solid foothold in the server market, cloud infrastructure, HPC, etc. As a consumer I appreciate the diversity of options in the current marketplace. We speak with our wallets. If we’re a market, I trust we’ll be served.

Apple turned heads here. For some computing needs (including my primary desktop), it appears the M1 Mac Studio is a winner. It doesn’t replace my Linux and Windows workstation, nor any of my servers, nor any of my Raspberry Pis. But for what I (and some others) need from a desktop computer, the M1 Mac Studio is the best thing Apple has done in quite some time. It hits the right points for some of us, in a price tier that’s been empty since the original cheese grater Mac Pro (2006 to way-too-late 2013). It also happens to be a nice jolt of competition. This is good for us, the consumers. Even if I never desired an Apple product, I’d celebrate. Kudos to Apple. And thanks!

Mac Studio M1 Ultra: The Decision

I’ve needed a macOS desktop for many years. My hackintosh, built when Apple had no current hardware to do what I needed to do, is more than 10 years of age. It’s my primary desktop. It’s behind on OS updates (WAY behind). It’s old. To be honest, I’m quite surprised it still runs at all. Especially the AIO CPU cooler.

The urgency was amplified when Apple silicon for macOS hit the streets. Apple is in transition, and at some point in the future, there will be no support for macOS on Intel. They’ve replaced Intel in the laptops, there’s an M1 iMac and an M1 mini, and now the Mac Studio. We’ve yet to see an Apple silicon Mac Pro, and while I’m sure it’s coming, I can’t say when nor anything about the pricing. If I assume roughly 2X multicore performance versus the M1 Ultra SOC, plus reasonable PCIe expansion, it’ll likely be out of my price range.

Fortunately, for today and the foreseeable future, the Mac Studio fits my needs. In terms of time == money, my main use is compiling C++ and C code. While single-core performance helps, so does multi-core for any build that has many units that can be compiled in parallel. So, for example, the Mac Studio with M1 Ultra has 20 CPU cores. Meaning my builds can compile 20 compilation units in parallel. Obviously there are points in my builds where I need to invoke the archiver or the linker on the result of many compiles. Meaning that for parts of the build, we’ll be single-core for a short period unless the tool itself (archiver, linker, etc.) uses multiple threads.

It’s important to note that a modern C++ compiler is, in general, a memory hog. It’s pretty common for me to see clang using 1G of RAM (resident!). Run 20 instances, that’s 20G of RAM. In other words, the 20 cores need at least 1G each to run without swapping. Add on all the apps I normally have running, and 32G is not enough RAM for me to make really effective use of the 20 cores, day in and day out.

So 64G would my target. And given that the CPU and GPU share that memory, that’s a good target for me. However…

Availability of Mac Studios with the exact configuration I wanted has been abysmal since… well… introduction. I wanted M1 Ultra with 64-core GPU, 64G RAM, 2TB storage. Apple’s lead time for this or anything close: 12 weeks. I’m assuming that a lot of this is the ongoing supply chain issues, COVID and possibly yield issues for the M1 Ultra. Apple is missing out on revenue here, so it’s not some sort of intentional move on their part, as near as I can tell. While I think there are M2 Pro and M2 Max on the horizon for the MacBook Pro (I dunno, 1H2023?), I think it’ll be a year before I see something clearly better for my use than the M1 Ultra. I can’t wait a year, unfortunately. I also can’t wait 3 months.

In fact, since I’m closing in on finishing the den, and need to move my office there, this is now urgent just from a space and aesthetics perspective. I intentionally designed the desk overbridges in the den to comfortably accommodate a Mac Studio (or Mac Mini) underneath either side. I DON’T want my hackintosh in this room! I want quiet, aesthetically pleasing, small, inconspicuous, efficient, and not a major source of heat. I need 10G ethernet. Fortunately, the Mac Studio ticks all of the boxes.

Today I picked up what was available, not exactly what I wanted. It’s an open box and hence $500 off: a Mac Studio with M1 Ultra, 64-core GPU, 128G of RAM and 1TB storage. The only thing from my wishlist not met here: 2TB storage. However, I’m only using 45% of the space on my 1TB drive in my hackintosh, and I haven’t tried to clean up much. I don’t keep music and movies on my desktop machine, but if I wanted to with the Mac Studio, I could plug in Thunderbolt 4 storage.

I’m much more excited about moving into the den than I am about the new computer. That’s unlikely to change, since the den remodeling is the culmination of a lot of work. And I know that I’m going to have to fiddle to make the new Mac Studio work well with my Dell U3818DW display. Assuming that goes well, I’m sure I’ll have a positive reaction to the Mac Studio. The Geekbench single-core scores are double that of my hackintosh. The multi-core scores are 7 times higher. This just gives me confidence that I’ll notice the speed when using it for my work. Especially since the storage is roughly a decimal order of magnitude faster. The 2TB is faster, but the jump will be huge from SATA to NVMe for my desktop. I notice this in my Threadripper machine and I’ll notice it here.

My main concern long-term is the cooling system. Being a custom solution from Apple, I don’t have options when the blower fans fail. Hopefully Apple will extend repairability beyond my first 3 years of AppleCare+. I like keeping my main desktop for more than 3 years. While in some ways it’s the easiest one to replace since it’s not rackmounted and isn’t critical to other infrastructure, it’s also my primary interface to all of my other machines: the Threadripper workstation for Linux and Windows development, my network storage machine, my web server, my gateway, and of course the web, Messages, email, Teams, Discord, etc. It saves me time and money if it lasts awhile.

UPS fiasco and mcrover to the rescue

I installed a new Eaton 5PX1500RT in my basement rack this week. I’d call it “planned, sort of…”. My last Powerware 5115 1U UPS went into an odd state which precipitated the new purchase. However, it was on my todo list to make this change.

I already own an Eaton 5PX1500RT, which I bought in 2019. I’ve been very happy with it. It’s in the basement rack, servicing a server, my gateway, ethernet switches and broadband modem. As is my desire, it is under 35% load.

The Powerware 5115 was servicing my storage server, and also under 35% load. This server has dual redundant 900W power supplies.

Installation of the new UPS… no big deal. install the ears, install the rack rails, rack the UPS.

Shut down the devices plugged into the old UPS, plug them in to the new UPS. Boot up, check each device.

Install the USB cable from the UPS to the computer that will monitor the state of the UPS. Install Network UPS Tools (nut) on that computer. Configure it, start it, check it.

This week, at this step things got… interesting.

I was monitoring the old Powerware 5115 from ‘ria’. ‘ria’ is a 1U SuperMicro server with a single Xeon E3-1270 V2. It has four 1G ethernet ports and a Mellanox 10G SFP+ card. Two USB ports. And a serial port which has been connected to the Powerware 5115 for… I don’t know, 8 years?

I can monitor the Eaton 5PX1500RT via a serial connection. However, USB is more modern, right? And the cables are less unwieldy (more wieldy). So I used the USB cable.

Trouble started here. The usbhid-ups driver did not reliably connect to the UPS. When it did, it took a long time (in excess of 5 seconds, an eternity in computing time). ‘ria’ is running FreeBSD 12.3-STABLE on bare metal.

I initially decided that I’d deal with it this weekend. Either go back to using a serial connection or try using a host other than ‘ria’. However…

I soon noticed long periods where mcrover was displaying alerts for many services on many hosts. Including alerts for local services, whose test traffic does not traverse the machine I touched (‘ria’). And big delays when using my web browser. Hmm…

Poking around, I seemed to only be able to reliably reproduce a network problem by pinging certain hosts with ICMPv4 from ria and observing periods where the round trip time would go from .05 milliseconds to 15 or 20 seconds. No packets lost, just periods with huge delays. These were all hosts on the same 10G ethernet network. ICMPv6 to the same hosts: no issues. Hmm…

I was eventually able to correlate (in my head) what I was seeing in the many mcrover alerts. On the surface, many didn’t involve ‘ria’. But under the hood they DO involve ‘ria’ simply because ‘ria’ is my primary name server. So, for example, tests that probe via both IPv6 and IPv4 might get the AAAA record but not the A record for the destination, or vice versa, or neither, or both. ‘ria’ is also the default route for these hosts. I honed in on the 10G ethernet interface on ‘ria’.

What did IPV4 versus IPv6 have to do with the problem? I don’t know without digging through kernel source. What was happening: essentially a network ‘pause’. Packets destined for ‘ria’ were not dropped, but queued for later delivery. As many as 20 seconds later! The solution? Unplug the USB cable for the UPS and kill usbhid-ups. In the FreeBSD kernel, is USB hoarding a lock shared with part of the network stack?

usbhid-ups works from another Supermicro server running the same version of FreeBSD. Different hardware (dual Xeon L5640). Same model of UPS with the same firmware.

This leads me to believe this isn’t really a lock issue. It’s more likely an interrupt routing issue. And I do remember that I had to add hw.acpi.sci.polarity="low" to /boot/loader.conf on ‘ria’ a while ago to avoid acpi0 interrupt storms (commented out recently with no observed consequence). What I don’t remember: what were all the issues I found that prompted me to add that line way back when?

Anyway… today’s lesson. Assume the last thing you changed has high probability of cause, even if there seems to be no sensible correlation. My experience this week: “Unplug the USB connection to the UPS and the 10G ethernet starts working again. Wait, what?!”.

And today’s thanks goes to mcrover. I might not have figured this out for considerably longer if I did not have alert information in my view. Being a comes-and-goes problem that only seemed to be reproducible between particular hosts using particular protocols might have made this a much more painful problem to troubleshoot without reliable status information on a dedicated display. Yes, it took some thinking and observing, and then some manual investigation and backtracking. But the whole time, I had a status display showing me what was observable. Nice!